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ABSTRACT 


LAND USE/LAND COVER MAPPING (1:25,000) OF TAIWAN, REPUBLIC OF CHINA 
BY AUTOMATED MULTISPECTRAL INTERPRETATIONS OF LANDSAT IMAGERY 

The applicability of digital computer-aided analysis techniques 
of LANDSAT images to identify and classify major land cover types 
of Taiwan was tested with a minimal amount of ground control data 
extracted from black and white airphotos by photointerpretation. A 
limited study area was selected to represent the wide spectrum of 
land covers present in Taiwan. A land use/land cover classification 
scheme was evolved in a step by step fashion for use with airphotos 
and LANDSAT imagery. The single date LANDSAT image taken on 
Nov. 1, 1972 was analyzed using supervised computer image proc- 
essing techniques. 

Three methods were tested for collection of the training sets 
needed to establish the "spectral signatures" of the land uses/land 
covers sought due to the difficulties of retrospective collection of 
representative ground control data. Computer preprocessing tech- 
niques applied to the digital images to improve the final classification 
results were geometric corrections, spectral band or image ratioing 
and statistical cleaning of the representative training sets. The 
geometric corrections provided a map base at 1:25,000 with position 
errors only slightly more than 50 feet. The statistical cleaning of 
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the representative training sets did not improve the training set accu- 
racy. However, a final evaluation of the value of statistical cleaning 
must await a future test of its impact upon map verification accuracy. 
A stepwise discriminant analysis was applied to evaluate the training 
set accuracy for 17 land uses/land covers. Ratios of MSS bands 
contributed little to the final accuracy achieved. MSS band 5 and 7 
achieved an overall training set accuracy of 79% which is comparable 
to that obtained by 10 MSS bands/ratios. 

A minimal level of statistical verification was made based upon 
the comparisons between the airphoto estimates and the classification 
results. The verifications provided a further support to the selection 
of MSS band 5 and 7. It also indicated that the maximum likelihood 
ratioing technique can achieve more agreeable classification results 
with the airphoto estimates than the stepwise discriminant analysis. 

Subsequently, final land use/land cover classification maps 
were produced at a scale of 1 :25, 000 with the cost of N. T $0. 16/ 
hectare (U.S. $0 . 004/hectare) for computer time only. A further 
verification of the classification maps needs to be done in the field. 

An application of land use/land cover mapping over the entire island is 
strongly recommended by the author. However, the validity of signa- 
ture extension over the entire island should be further investigated. 
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I. INTRODUCTION 


1.1 Background 

It becomes increasingly important to manage Taiwan's natural 
resources more efficiently as pressure upon them increases due to a 
growing population with expectations of ever rising standards of 
living. The task requires that accurate inventories of the spatial 
distribution of natural resources be periodically completed in a 
timely fashion. Until as recently as a generation ago such inven- 
tories were made almost entirely on the ground for Taiwan. Geolo- 
gists traveled widely in exploring for minerals; foresters and agron- 
omists examined trees and crops at close hand in order to assess 
their condition; surveyors walked the countryside in the course of 
preparing the necessary large scale topographic maps. The advent 
of the collection of aerial photography in Taiwan represented a big 
step forward. However, the airphotos have not gained wide use for 
natural resource management as they are sensitive material and are 
classified. 

Remote sensing methodology, of which the use of aerial photog- 
raphy is a subset, uses images collected singlely or simultaneously 
in spectral ranges distributed over the electromagnetic spec- 
trum. The technique employs images taken throughout the spectrum 


2 


from the very short wavelengths at which gamma rays are emitted 
to the comparatively long wavelengths at which RADAR operates. 
These images can potentially secure far more information about the 
nature and condition of an area's resources than can be obtained with 
conventional aerial photography which is restricted approximately to 
the sensitivity range of human vision and little beyond into the photo 
infrared. Remote sensing images are obtained from aircraft or 
spacecraft, including unmanned satellites. This technique employs 
both cameras and a large number of other more recent sensing de- 
vices. Remote sensing techniques are currently being extended so 
that the image obtained by the sensing devices can be processed and 
interpreted automatically and a large volume of information dealt with 
in a rapid and timely fashion. 

NASA launched ERTS-1 (renamed LANDSAT-1) into a near- 
polar orbit on 23 July 1972 to remotely sense the surface of the earth 
with a multispectral scanner and a set of three return beam vidicons. 
Multispectral imagery obtained from this spacecraft, with its capac- 
ity for repetitive coverage and synoptic view, has provided a better 
tool to monitor the dynamic nature of the earth's natural resources. 
This characteristic is particularly important to the dynamic proc- 
esses of agricultural and forest management and land use planning. 
Often, for example, a land use map has been out of date when it was 
first published. The use of LANDSAT type images can complement the 
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map revision procedures, providing a capability to monitor trends 
in land utilization on a nearly real-time basis (Place, 197 3). 

Recently, Taiwan has investigated the potential use of the 
larger scope of remote sensing data collection for natural resource 
inventories (Chang, 1974; Miller, 1974; Pan, 1974; and Wang, 1974). 
Four excellent LANDSAT images of Taiwan were taken on November 1, 
1972 with over 90% cloud free conditions. Ninety percent of the land 
area of the island was covered by the center two images of the four. 
The direct visual interpretation of LANDSAT imagery of Taiwan was 
first applied to regional geologic studies (Wang, 1976). Density 
slicing and image enhancement techniques were also tested to help 
delineate special features. However, the inadequate scale of the 
photographic format of the LANDSAT images and its attendant limita- 
tions on the spatial resolution discouraged Taiwan resource managers. 
The question often brought out was, "does LANDSAT imagery have 
sufficient spatial resolution for resource management in Taiwan?" 

Taiwan is a mountainous island and heavily vegetated with sub- 
tropical forests. The land use patterns are small and complicated. 
Thus, Taiwan requires detailed spatial information in a larger scale 
for resource management purposes. LANDSAT imagery in the photo- 
graphic form does not currently meet these requirements. Recently 
special processing and photographic display has been prepared and 
the test portions of LANDSAT images of selected snail sites in the 
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United States comparable to high altitude airphotos. These promising re- 
sults are far superior to the standard LANDS AT products commercially 
available for Taiwan. The digital form of the LANDS AT images contains 
considerably more spatial information than is contained in the more 
commonly available LANDSAT photographic form, since some of the original 
image's resolution is lost in the photographic reproduction processes 
being currently applied. The digital form of these images also lend 
themselves to computer-aide d methods of interpretation. The resulting 
map product can be made through computer processing techniques at 
a proper scale for the resource management. Moreover, the com- 
puter multispectral analysis approach significantly improves the 
amount and accuracies of the interpretation relative to direct visual 
interpretations of the LANDSAT photographs. 

Computer processing of remote sensing imagery (automated 
image processing) has been widely tested by many disciplines for 
such applications as urban planning, crop and forestry inventory, 
water resource management, geologic mapping, etc. This previous 
work established the value of computer processing of LANDSAT 
images for resource inventories, however, most of this work was 
restricted to areas of minimal topographic relief. Larger variations 
in topographic slope and aspect may significantly affect the accuracy 
of the analysis of LANDSAT data collected over mountainous terrain 
(Hoffer, 1974). The ground resolution for LANDSAT imagery is 


5 


about 60 by 80 meters. Each resolution cell represents an averaging 
of the spectral return from this nominal 60 by 80 meter ground cell. 
The land use pattern of Taiwan is small scale and heterogenous and 
considerably more complicated than the United States where agri- 
cultural practices employ large homogenous fields. Thus, the reli- 
ability of the identification of the varieties of Taiwan land use and 
land cover by computer analysis of LANDSAT imagery must be spe- 
cifically tested. 

A remote sensing program in Taiwan was initiated on several 
fronts in the early 1970s (Miller, Chang and Wang, 1974). Although 
the advantages and disadvantages of satellite imagery were well 
recognized by these programs, the potential use of the digital 
LANDSAT imagery could not be demonstrated. Recently a more de- 
tailed and more expensive approach to crop and forest inventory 
of Taiwan was initiated using infrared aerial photographs of the 
coastal plains. The study described here investigated the possibility 
of applying computer image analysis techniques to the available 
LANDSAT images of Taiwan. The specific purpose was to extract 
land cover maps for use in land use planning and thus to provide an 
additional stepping stone to promote the use of remote sensing in 


Taiwan. 
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1.2 Study Objectives 

This study was designed to test the applicability of digital 
computer-aided analysis techniques of LANDSAT multispectral 
scanner images to identify and classify major land cover types of 
Taiwan with a minimal amount of ground control data. The specific 
objectives of this study were: 

1. to develop a land use/land cover hierarchical classification 
system for use with remote sensing data, 

2. to design a practical method to collect the ground control 
data needed to establish the training sets used in the computer classi- 
fication processes, 

3. to select the optimal combination of image spectral bands 
and ratios for use in LANDSAT mapping of Taiwan land cover types 
evaluated in terms of the accuracy and economies of the process, 

4. to produce land cover maps by computer classification of 
LANDSAT imagery at a scale of 1:25,000, and 

5. to design a reasonable scheme for the verification of the 
accuracy of these results using limited ground control information. 

The general objective of this study was to demonstrate by 
example the nature and capability of automated image processing of 
remote sensing imagery collected from aircraft and satellite. 

It is hoped that by reviewing the results of this research 
that these new remote sensing techniques can be more fully 
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appreciated and the approach more widely employed in Taiwan in the 
near future. 

1 . 3 Study Outline 

This study was carried out at Colorado State University for the 
past year using the computer software system entitled the LANDSAT 
Mapping System (LMS) operating on a Control Data Computer 
(CDC 6400) (Appendix A), A limited study area was selected to rep- 
resent the wide spectrum of land cover present in Taiwan. A land 
use/land cover classification scheme was evolved in a step by step 
fashion for use with airphotos and LANDSAT imagery. The only 
available, good quality, single date LANDSAT multispectral image 
taken on November 1, 1972 was analyzed using supervised computer 
image processing techniques. Three methods were tested for the 
collection of the training sets needed to establish the "spectral signa- 
tures" of the land uses sought due to the difficulties of retrospective 
collection of representative ground control data. Computer pre- 
processing techniques were applied to the digital images to improve 
the final classification results. The impacts of the application of 
these techniques on the test accuracies achieved were investigated in 
detail. The techniques applied included geometric corrections, 
spectral band or image ratioing and statistical cleaning of the repre- 
sentative training sets. The investigation led to the optimal selec- 
tion of spectral bands or images and their ratios. Subsequently, final 
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land cover classification maps were produced at a scale of 1:25,000 
using the favorable techniques and imagery. A computer cost evalu- 
ation of these processes was completed relative to the time and dollar 
costs of using the Colorado State University CDC 6400 computer. The 
statistical verification of the accuracy of the classification map re- 
sults was hard to complete within the United States due to the paucity 
of ground control data. However, a minimal level of statistical veri- 
fication was possible to substantiate the quality of the maps produced. 


II. DEVELOPMENT OF A HIERARCHICAL LAND USE/ LAND 
COVER CLASSIFICATION SCHEME FOR TAIWAN 


2.1 Description of Study Area 

2.1.1 Introduction 

Taiwan island is 394 kilometers long and 144 kilometers 
broad at the widest point. It lies between 21°45' and 25°38' north 
latitude and 119°18' and 122°7' east longitude with an area of 
35,961 square kilometers (Fig. 2.1). The very dominant topograph- 
ic feature of Taiwan is the central range of high mountains running 
from the northeast corner to the southern tip of the island. This 
backbone of the island contains 62 peaks with elevations above 3000 
meters and which rise abruptly from the sea along the eastern 
Pacific coast. The western half of the island facing the China main- 
land and shallower Taiwan Strait is a terraced succession of uplands 
and coastal plains and basins. Approximately one -third of the total 
land area of 35,961 square kilometers is arable and occurs on the 
gentler western slopes. The mountain areas are forested or heav- 
ily revegetated where the forests have been removed. 
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Fig. 2. 1. GEOGRAPHIC POSITION OF TAIWAN AND THE AREA 
SELECTED FOR LAND USE/ LAND COVER MAPPING. 
The three squares constitute the general study area and 
indicate the three of 1/50, 000 topographic maps to be 
mapped at a scale of 1/25,000. Scale 'v 1/2,500,000. 
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2,1.2 Physiography 

The study site is located in the western side of central Taiwan 
and represents 2, 100 square kilometers which is about 6% of the total 
land area. It constitutes the area of three 1:50,000 topographic 
maps, namely the Lu-Kang, Taichung and Kuo-Hsing maps respec- 
tively from west to east (Fig. 2.1). Beginning at the west coast of 
the island at the Taiwan Strait, the area selected extends eastward 
through the coastal plains, terrace tablelands and Taichung basin to 
the foothills of the central range (Fig. 2.2). The area was selected 
to contain a complete sampling of the types of land uses practiced in 
Taiwan. Variations in land use do occur in a north-south sense over 
Taiwan but are considerably less than those differences induced by 
the topographical extremes represented in the site selected. The 
elevation of the site increases eastward from sea level to 2307 meters 
which is the highest peak in the general area. The Wu river domi- 
nates the drainage and winds its ways across the site to the Taichung 
basin cutting through the tableland to reach the sea at the upper left 
corner of the area. The average elevation of the tableland is about 
200 meters above sea level. 

The climate of central Taiwan is subtropical. The mean 
monthly temperature in winter is above 15°C except for the mountain 
region. This is favorable for the cultivation of rice and other crops, 
including sugar cane, pineapples and bananas. The mean annual 



. THE DRAINAGE PATTERN AND MAJOR LAND USES OF THE STUDY AREA. Pre- 
pared by photointerpretation from 9" X 9" color print of LANDSAT-1 imagery of 
November 1, 1972. Transferred to the map base by the aid of a Zoom Transfer Scope. 
Scale 'v 1/300 ,000 . 


Fig. 2.2 
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precipitation of the area is around 1.7 meters. The wet season, 
from May to September, contributes almost 80% of the annual pre- 
cipitation. Typhoons and thundershowers are the major types of 
rainfall. The winter monsoon prevails from October to March with 
mean maximum wind velocity as high as 17 m/sec. The monsoon 
can cause heavy damage to field crops on the coastal plains. A 
typhoon or thundershower may cause a flood and very severe erosion 
and redeposition due to the precipitous terrain and the short steep 
river courses. 

2.1.3 Agriculture 

The study site contains the central portion of the Taichung 
basin and was selected to be representative of Taiwan in physio- 
graphy and agriculture. The site contains a sample of the coastal 
plains, terrace tablelands, and basin and foothills where most of the 
major economic activities of the island occur. Taichung basin is 
where the provincial government of Taiwan is located and contains 
some of the most productive agricultural lands. Taichung is the 
third largest city in Taiwan and is the cultural, economic, industrial, 
and recreational center of central Taiwan. Taichung Harbor has 
recently been built to the north of the Wu River mouth. The study 
area contains these features and thus is an area which will play an 
increasingly important role in the future economic development of 


the Republic of China. 
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The important agricultural products of the basin area are 
rice, sugar cane and sweet potatoes and the miscellaneous crops 
include corn, sorghum, peanuts, etc. Bananas and grapes are the 
common fruits planted in the coastal low lands, while oranges and 
other fruits are cultivated in the foothills. However, most upland 
and lowland orchards are small areas scattered throughout the rice 
paddies or among other natural tree growth. Deciduous trees are 
the major forest type in the basin area. Acacia is the dominant type, 
especially on the terrace tablelands. Conifers dominate the moun- 
tain slopes with elevations above 1000 meters. 

Tidal flats extend a few kilometers into the sea during low tide 
and 820 hectares of reclaimed land have been developed in this area. 
Fish ponds and rice paddies are the main type of land uses in and 
adjacent to these tidal flats. 

2. 2 LANDSAT Multispectral Scanner Imagery of Taiwan 
Available for Analysis 

2.2.1 Introduction to the LANDSAT System 

NASA launched the Earth Resources Technology Satellite -1 
(subsequently renamed LANDSAT-l) into a near-polar, sun- 
synchronous, circular orbit on July 23, 1972. It achieved a success- 
ful orbit of about 920 kilometers (570 miles) above the surface of the 
earth circling the globe every 103 minutes or 14 times a day 
(LANDSAT Users Handbook, 1976). LANDSAT-l is able to view the 
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same spot anywhere on the surface of the earth at the same local day 
time every 18 days. Subsequently LANDSAT-2 was launched into a 
similar orbit so that imagery became available alternately with a 9 
day interval. About February 1, 1977 the orbit of LANDSAT-1 was 
readjusted so that the intervals between the imaging paths of the two 
satellites is now 6 days and 12 days. These satellites both carry a 
television camera system (Return Beam Vidicon or RBV) and a 
radiometric scanner (Multispectral Scanner or MSS) which together 
obtain imagery in seven different optical spectral ranges of visible 
and photoinfrared energy reflected from the earth's surface. Four 
spectral ranges are covered by the Multispectral Scanner (MSS) 
imagery (Table 2.1). 


Table 2.1. LANDSAT MULTISPECTRAL SCANNER SPECTRAL 
RANGES OR BANDS. 


Wavelength Interval 


MSS Band "Color" Range (in micrometers) (in Angstroms) 


4 


Green 


o 

0.5 to 0.6 pm 5000 to 6000 A 


o 


5 


Red 


0.6 to 0.7 pm 6000 to 7000 A 


o 


6 


Photoinfrared 


0.7 to 0.8 pm 7000 to 8000 A 


o 


7 


Photoinfrared 


0.8 to 1.1 pm 8000 to 11000 A 
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2.2.2 Introduction to the Multispectral Scanner System 

Incident solar electromagnetic energy reflected from the sur- 
face of the earth to the satellite is focused by an oscillating scan 
mirror onto a set of 24 sensors or detectors in the MSS (Multispec- 
tral Scanner) device. These sensors form an array which one may 
picture schematically as a set of four columns, of six sensors each, 
with one column for each MSS spectral band (Fig. 2.3). 

The instantaneous view which each sensor has of the ground is 
a square of approximately 79 m by 79 m (259 ft by 259 ft). The six 
sensors in a given band view collinear and contiguous resolution 
elements. Thus the set of six sensors in a given column instantane- 
ously sweeps out or views a strip approximately 474 m by 79 m 
(1554 ft by 259 ft) (LANDSAT Users Handbook, 1976). 

The region on the ground viewed by the sensors in a given 
spectral band in one sweep of the mirror from west to east is called 
a swath (Fig. 2.3). It is 474 m wide and sweeps out a length of about 
185 km (1554 ft by 115 mi). That region within a swath which is 
viewed by a single sensor, or a set of the four different sensors in 
a multispectral sense, is called a scan line of the resulting image. 

The lines and swaths do not lie perpendicular to the ground 
orbit track of the satellite because, while the mirror is scanning, 
the satellite is moving and the earth is rotating. The velocity of the 


mirror and satellite relative to the earth is such that when the mirror 
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Fig. 2. 3. SCHEMATIC DIAGRAM OF LANDSAT’S MULTISPECTRAL 
SCANNER CONCEPT. Six scan lines constituting a swath 
are swept out as shown for each mirror scan. The angle 
of the scan lines is caused by the relative motion between 
the satellite and the earth's surface. The length of each 
scan line is ~ 185 km while its width is 79 m. 


has returned to its starting point and is ready to begin its next con- 
tiguous swath the satellite has moved forward relative to the earth's 
surface such that there is no gap or overlap between swaths. The 
imagery obtained in this fashion is continuous as the satellite con- 
tinues around the earth. Analog magnetic tape of these images is 
recorded at a ground tracking station whenever the satellite is within 
range. The image may be temporarily stored on board the space- 
craft for subsequent retransmission when it is in the range of a 
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ground station. Subsequently, these tapes of the continuous image 
are replayed and displayed as discrete images of 390 swaths of 2340 
lines yielding 125 lines/cm on a inch by 9-| inch film format. 
Between the display of each discrete image the tapes are rewound 
approximately 10% to create a corresponding duplication or overlap 
on each successive image in a N-S sense. Each successive day the 
satellite moves about 165 km relative to the ground in an E-W sense 
creating a side lap of about 10% as the total scan line length is approx- 
imately 185 km. After 18 days of this sideskipping the given satellite 
closely repeats the same ground path over Taiwan. 

2.2.3 Introduction to the Digital or Discrete Nature 
of the LANDS AT Images 

The signal recorded at the ground station is in analog form 
and when played back as outlined above provides the basis to produce 
the commonly available black and white or color photographs of the 
LANDSAT images. Each of the 24 MSS sensors measures the inten- 
sity of the reflected solar energy it receives in its respective wave- 
length interval or spectral band and produces a separate output of 
the continuously varying or analog signal. Thus individual black and 
white photographs of each of the four spectral bands or color combi- 
nations can be produced on the ground. The analog recording of 
these signals also provides the source for the digital or discrete 
picture element format of the LANDSAT imagery which is compatible 
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with digital computer analysis. This process digitally or numerically 
samples the analog recorded, continuously variable, electronic signal 
from each sensor. This sampling occurs on the average of 3, 216 
times on one satellite scan line of 185 km and the string of numbers 
obtained is recorded on a second magnetic tape in a digital format 
which is compatible with standard digital tape of the electronic com- 
puter. This is called the Computer Compatible Tape (CCT) form of 
the LANDSAT image. 

The region on the ground for which the reflected solar energy 
intensity is measured and numerically recorded is called a pixel 
which is short for picture element. A computation of 3,216 times 
the 79 m ground resolution noted earlier yields a line length greater 
than 185 km as the pixels overlap about 29% along the scan lines. 

This yields an effective ground resolution of 79 m (N-S) by 57 m 
(E-W) for each pixel. Each pixel is represented by a discrete num- 
ber for each spectral band on the CCT. These values range from 0 
to 63 in MSS band 7 and from 0 to 127 in MSS band 4, 5 and 6 with 0 
the lowest energy level and 63 or 127 the highest. Each and every 
MSS pixel is represented on the CCT by a set of four numbers for the 
instantaneous reflected solar energy values measured in each of the 


four MSS bands. 
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2.2.4 LANDSAT Imagery of Taiwan 

Four contiguous LANDSAT -1 MSS images of Taiwan were ob- 
tained shortly after launch on November 1, 1972 (Fig. 2.4). One of 
these was selected for analysis. Computer compatible tapes (CCT's) 
were purchased from the EROS (Earth Resources Observation 
System) Data Center of the U.S.G.S. located at Sioux Falls, South 
Dakota (Table 2.2). One 185 km by 185 km image consists of four 
CCT's, each representing all of the four MSS values for each pixel 
for an 46 km wide strip of the image. 


Table 2.2. SPECIFICATIONS OF THE LANDSAT- 1 IMAGE ANALYZED. 


Date image taken 
Scene ID No. 

Sun angle 

Sun azimuth 

Cloud cover 

Quality assessment 

MSS Bands used 

Center coordinates of frame 

Type of product available 


November 1, 1972 

1101-01550 

43° 

144° 

10 % 

good 

4, 5, 6, 7 (all) 

N 24°24' E 121°05' 

20" X 20" B/W prints of Bands 5 k 7 
CCT’s - 7 track 800 BPI Seq. # 1, 

2 and 3 of 4 
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Fig. 2.4. FOUR LANDSAT -1 FRAMES COVERING TAIWAN ON 
NOVEMBER 1, 1972. NASA LANDSAT ID# shown on 
upper left corner of the images. The location of three 
of the 1/50,000 topographic maps to be mapped at a 
scale of 1/25,000 and identified in Fig. 2. 1 is indi- 
cated within image #2. 
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2. 3 Sampling the Land Use and Developing a 
Ground Control Data Base 

2.3.1 Basic Considerations of the Classification Scheme 

Land use refers to "man's activities on land which are directly 
related to the land" (Clawson and Stewart, 1965). Land cover, on the 
other hand, describes "the vegetational and artificial coverings of 
the land surface" (Burley, 1961). Some land use activities of man 
can be directly related to the type of land cover. For instance, 
using imagery on which rice can be interpreted as the land cover, it 
may subsequently be inferred that farming is the present land use 
activity although not actually visible as such. Other activities, espe- 
cially recreational activities, can only be related with difficulty to 
land cover by use of remote -sensing techniques. However, use of 

supplemental information from other sources permits a more func- 
tional approach to the classification of land use (Anderson, Hardy, and 

Poach, 1971 and 1976). Variation in land cover is therefore the basis 

for ary land use classification system employing remote sensing imagery. 

The title of "land use napping" is often applied to remote sensing image 

classification activities as a whole which tends, as in this study, to 

amalgamate the distinct concepts of napping land use and land cover. 

The ground resolution of the LANDSAT image has been shown 

to be nominally 57 m by 79 m or 0.45 hectare. At orbital altitudes 

the single 0.45 hectare pixel recorded for each of the four MSS bands 
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may represent the integration of a variety of spectral responses for 
the land covers it contains. Thus an individual pixel may represent 
a gross generalization (or aggregation) of the 0.45 hectare area it is 
measuring (Anderson, 1971). An area of 0.45 hectare in Taiwan 
may consist of several different specific types of land use/land cover. 
Therefore, the relationship between the reliability of LANDSAT 
images for land use identification and its dependence on distinct 
spectral returns for the various land covers must be tested for the 
Taiwan case. 

2.3.2 Testing the Scheme on Low Altitude Airphotos 

The aforementioned considerations led to the adoption of a 
land use/land cover hierarchical scheme for initial testing with the 
interpretation of low altitude airphotos (Table 2.3). This scheme 
was subsequently revised by use and provides a logical basis for the 
collection of the ground control (ground truth) data to be used as 
training and verification data in the automated image processing 
procedures employed. Subsequently this same airphoto classification 
scheme provided the basis for the abridged classification scheme 
applied to the LANDSAT imagery. 

2. 3. 2.1 Assembling Grid Sampled Ground Control Data 
A grid sampling method tied to the LANDSAT image grid was 
devised to test this airphoto classification system, collect specific 
point type ground control data and estimate the relative amounts of 


TABLE 2.3 


. LAND USE CLASSIFICATION SYSTEM CHECKED FOR USE WIIH LOW ALTITUDE BLACK AND WHITE AIR- 
PHOTOGRAPHS. Revised from the Anderson system (Anderson, Hardy, and Roach, 1971). 
This system intercornbines land use and land cover. Upon actual application Level III 
was deleted as it could not be reliably applied to lew altitude black and white air- 
photos . 


Code 

Level l 

Code 

Level // 

Code 

Level /// 

100 

Urban and built-up lands 

1 10 

Commercial and service 




120 

Residential and new community 





130 

Industrial 





140 

Transportation and irrigation 





150 

Institutional 





160 

Strip and clustered settlement 





170 

Mixed 



200 

Agricultural lands 

210 

Grains 

221 

Sugar cane 


220 

Crops 




222 

Vegetable 





223 

Sweet potato 





224 

Peanut 





225 

Others 



230 

Orchards 

231 

Grapes 





232 

Banana 





233 

Oranges 





234 

Others 

300 

Forested lands 

310 

Hardwoods 

311 

Acacia 





312 

Mixed 



320 

Conifers 

321 

China firs 



330 

Bamboo 



400 

Barren lands 

410 

Gravels 





420 

Tidal flat 



500 

Water surfaces 

510 

Water ways 





520 

Ponds and reservoirs 





530 

Estuaries 





540 

Sediment-laden water 



600 

Range lands 

610 

Grassland 





620 

Scattered grass 
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land uses in the study area for final map accuracy verification. 
Symbolic images were prepared from the LANDSAT CCTs using a 
computer line printer. This was accomplished by assigning an 
alphabetic symbol to a range of the values stored on the digital com- 
puter tapes for the various pixels. These symbols for a particular 
one of the four MSS band values are then printed by a computer line 
printer in their proper spatial or geographic position. For example, 
the intensity of the reflected radiation recorded on the CCT for a 
given pixel and specific one of the four MSS bands takes on values 0 
to 63, thus we might print the letter 

M = "M" overprinted with "I" for the 0-10 range of reflected 
energy, 

U for the 21-30 range, 

+ for the 31-40 range, 

for the 41-50 range, 

0 = "0" overprinted with for the 50-60 range, and leave 

blank for the range over 60. 

This yields a symbolic image clearly representing each pixel on the 
ground area imaged by the satellite (Fig. 2. 5). This graymap, as it 
will be called hereafter, provides a large scale, photographic -like 
rendition of the reflected radiation reaching the satellite in one of 
the four MSS bands. It differs from a conventional black and white 
photograph in that each of the pixels or ground resolution cells is a 
discrete symbol. Also, a panchromatic airphoto records all the 
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.5. A SMALL. PORTION OF THE LANDSAT MSS BAND 7 LINE PRINTER GRAYMAP 
OF THE TAICHUNG MAP. 1:25,000 scale in true geometry. Line 
number and column number are designated to identify the rela- 
tive location of each pixel. 


27 


reflected solar energy from about 0.4 to 0.7 pm while the above 
techniques produces four graymaps each representing a narrower 
spectral interval or band, e. g. MSS band 5 yields a graymap of the 
0.4 to 0.5 pm reflected energy. 

The study area is exactly coincident with three of the Taiwan 
basic series of 1:50, 000 topographic maps (Fig. 2. 1). Each of these 
maps was photographically enlarged exactly two times to provide a 
transparent map at a scale of precisely 1:25,000 and approximately 
1 by 1 meter (40" by 40"). During the preparation of the LANDSAT 
image each pixel is resampled from the CCT in such a fashion that 
the resulting graymaps represent 1:25,000 scale line printer symbol 
maps upon which the transparent topographic map may be overlaid. 
The geometric rectification procedure employed in this operation will 
be discussed in more detail in the next chapter. Original picture 
elements or pixels have been resampled or reformed in this process 
and will now be referred to as discrete ground cells or simply "cells." 
Suffice to say at this point that the large, transparent 1:25,000 top- 
ographic map could be registered upon the LANDSAT graymaps for 
each of the four of MSS bands to an accuracy of + 0. 5 cell or alpha- 
betic symbol (Fig. 2.6). 

A sampling grid was laid out upon each combination of top- 
ographic map and graymap so as to mark out every 30th column and 
every 30th line of graymap symbols (Fig. 2.6). This provided the 
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Fig. 2.6. EXAMPLE OF THE REGISTRATION OF A SMALL PORTION OF THE TAI- 
CHUNG TOPOGRAPHIC MAP UPON THE LANDSAT MSS BAND 7 GRAYMAP. 
1:25,000 scale in the true geometry. Line number and column 
number which are designated by the arrows and bars locate 
the 30 by 30 sample grid and 3 by 3 array, respectively. 

The interior rectangles further emphasize the 3 by 3 arrays 
selected for airphoto interpretation. 
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regular sampling grid whose actual land use/land cover was esti- 
mated by airphoto interpretation. This sampling grid, while regular 
in spacing, provides a random sample of the land uses/land covers 
which occur in the area to be mapped. The two maps could be mis - 
registered by as much as +0.5 line printer symbol thus a group of 
3 by 3 symbols were identified for photointerpretation with the 30 by 
30 sample cell as the center of the 3 by 3 array (Fig. 2. 6). The top- 
ographic map was annotated so as to show the ground location of each 
of the 3 by 3 symbol arrays which represent ~1710 m by ~ 2370 m 
on the ground. Aerial photographs of Taiwan are treated as sensi- 
tive material. Thus a blue print 1:25,000 copy of each of the anno- 
tated topographic maps together with a preliminary classification 
scheme (Table 2.3) was returned to Taiwan for photointerpretation. 
Several hundred low altitude, black and white airphotos with scale of 
1:16,700 were used as the basis for interpreting the land uses/land 
covers within the sampled 3 by 3 rectangular arrays of cells. Most 
of the airphotos used were taken during November and December, 

1973 or about one year after the available LANDSAT image. A small 
portion of the airphotos were taken in the spring, 1974. A Zoom 
Transfer Scope was used to change scale and superimpose a local 
area of the 1:25,000 topographic map upon the corresponding air- 
photos. Localized terrain and cultural features were used to match 
the group of 3 by 3 cells annotated on the map to their proper position 
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upon the airphotos. This rectangular group of nine cells was then 
annotated upon the airphotos. Three hundred twenty three arrays of 
nine ceUs each were annotated in this fashion upon their respective 
sets of photographs. These photographs were next interpreted for 
the land use/land cover which occurred in each of the arrays of nine 
cells using the preliminary classification system (Table 2.3). The 
interpretation was completed by a professional staff member of the 
Mining Research and Service Organization of Taiwan and checked by 
the Taiwan Forest Bureau. A data form containing a sketch of the 
3 by 3 cells was completed for each sample array by the interpreter 
(Fig. 2.7). Upon it was sketched the land use/land cover of the 
array identified by the respective codes. Additional ancillary data 
was also noted on the form such as the date, quality, etc. of the air- 
photos used together with any comments. 

The land use/land cover of each of the topographic maps was 
summarized by tabulating its occurrence in numbers of individual cells 
(Table 2.4). The majority of the cells were interpreted as having 
a single dominant land use/land cover. However, quite a number of 
cells were identified as containing two different land uses and these 
were counted as 0.5 cell to each of the two categories. The third 
order of detail (Level III) land uses were not tabulated and were 
dropped from the preliminary classification scheme at this point as 
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their photointerpretation proved unreliable from the low altitude 
black and white airphotos . 
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Fig. 2.7. SKETCH OF THE MAP PORTION OF THE DATA FORM 
COMPLETED BY AIRPHOTO INTERPRETATION. The 
center cell occurs every 30 by 30 cells on the 1:25,000 
graymaps . A 3 by 3 array of cells is interpreted to 
minimize the impact of misregistration between the trans 
parent topographic map overlay and graymap. Each 
rectangular cell is ~ 0.45 hectare (~1.1 acre) on the 
ground. Land uses/land covers are identified in each 
cell by three digit code numbers (Table 2. 3). 


The assemblage of this data provides a distributed type of 
training set for later input to the LANDSAT image classification pro 
cedure. At this point it provides an accurate review of the land use 
of 2760 sampled points distributed over the study site and three 
maps (Table 2.4). Since the sample points used were assembled 


TABLE 2.4. INVENTORY OF THE LARGE SAMPLE OF GROUND CONTROL CELLS INTERPRETED FROM BLACK AND 
WHITE A1RPHOTOS. Based on 3 topographic maps of 1/50,000. Land areas only were sampled for 3 by 3 array of 
cells yielding 2370 m x 1710 m (on the ground). A cell covers 0.45 hectare. 


Land Use Class Lu- Kang Map Taichung Map Kuo Using Map 3 Maps Combined 


Code 

Level / 

Code 

Level! [ 

/ 

II 

i 

11 

/ 

II 

i 

11 

100 

Urban lands 

110 

Commercial 

38 cells 

0 cells 

1 85 cells 

1 5 cells 

27 cells 

0 cells 

250 cells 

15 cells 



120 

Residential 


1 


58 


0 


59 



130 

Industrial 


0 


24 


0 


24 



140 

Transportation 


10 


15 


0 


25 



150 

Institutional 


0 


10 


9 


19 



160 

Clustered 


20 


63 


18 


101 



170 

Mixed 


7 


0 


0 


7 

200 

Agricultural lands 

210 

Grains 

300 

46 

681 

265 

247 

25 

1228 

336 



220 

Crops 


248 


334 


87 


669 



230 

Orchards 


6 


82 


135 


223 

300 

Forested lands 

310 

Hardwoods 

3 

3 

233 

191 

111 

618 

1013 

812 



320 

Mixed woods 


0 


12 


117 


129 



330 

Conifers 


0 


29 


38 


67 



340 

Bamboo 


0 


1 


4 


5 

400 

Barren lands 

410 

Gravels 

41 

10 

26 

26 

31 

31 

98 

67 



420 

Tidal flat 


31 


0 


0 


31 

500 

Water surfaces 

510 

Water ways 

40 

10 

13 

10 

3 

3 

56 

23 



520 

Ponds and reservoirs 


27 


3 


0 


30 



530 

Estuaries 


3 


0 


0 


3 



540 

Sediment-laden water 


— 


— 


— 


— 

600 

Range land 

610 

Grassland 

3 

3 

33 

10 

79 

7 

115 

20 



620 

Scattered grass 


0 


23 


72 


95 


Total 


425 cells 


1171 cells 


1 1 64 cells 


2760 cells 
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from a regular grid, the relative populations of each land use can 
now be computed to provide a basis for the subsequent verification 
of the maps produced by computer interpretation of the LANDSAT 
image (Table 2.5). 

2. 3. 2. 2 Assembling Areal Ground Control Data 

Areal ground control data was collected by the same procedure. 
Twenty -five rectangular or square sample areas were selected from 
the LANDSAT graymap/topographic map combinations. These areas 
were distributed over the land area of the three topographic maps so 
as to provide a reasonable sample of the variety of land uses which 
occurred in the study area. These small map areas ranged from 30 
by 30 cells on the graymap (2370 m by 1710 m) to 100 by 100 cells 
(7900 m by 5700 m). The map areas were transferred to, and anno- 
tated upon, the same airphotos employed in the previous sample data 
collection using the Zoom Transfer Scope for scale change and local 
topographic fit. The land use inside the sample areas noted on the 
photos was interpreted and sketched on the maps by the same photo - 
interpreter noted earlier. Nominally 1:16,700 sketch maps were 
prepared of each sample site map (Fig. 2.8). Some residual air- 
photo distortions remain in these maps as they were not retrans- 
ferred back to the 1:25,000 graymap/topographic map composite. 
However, their rigorous geometric relationship to the LANDSAT 
imagery is not critical. 


TABLE 2.5. ESTIMATION OF THE RELATIVE AMOUNTS OF LAND USE OF THE AREA TO BE MAPPED. Based on the 

photointerpretation of the 2760 ground cells provided in Table 2. 4 for the area of the 3 of 1/50,000 topographic maps. 
Values shown are the percentage of the total land area projected from the sample data to be of the given land use type. 


Land Use Class Lu*Kang Map Taichung Map Kuo-Hsing Map 3 Maps Combined 


Code Level 1 

Code 

Level II 

/ 

// 

l 

II 

/ 

II 

/ 

II 

100 Urban lands 

110 

Commercial 

9% 

0% 

16% 

1% 

2% 

0% 

9% 

0.9% 


120 

Residential 


0 


5 


0 


2 


130 

Industrial 


0 ’ 


2 


0 


1 


140 

Transportation 


2 


1 


0 


1 


150 

Institutional 


0 


1 


0.5 


1 


160 

Clustered 


5 


6 


1.5 


3 


170 

Mixed 


2 


0 


0 


0.3 

200 Agricultural lands 

210 

Grains 

71 

11 

58 

23 

21 

2 

45 

13 


220 

Crops 


58 


29 


7 


24 


230 

Orchards 


2 


6 


12 


8 

300 Forested lands 

310 

Hardwoods 

1 

1 

20 

16 

67 

53 

37 

29 


320 

Mixed woods 


0 


1 


10 


5 


330 

Conifers 


0 


3 


3 


3 


340 

Bamboo 


0 


0 


1 


0 

400 Barren lands 

410 

Gravels 

9 

2 

2 

2 

3 

3 

3 

2 


420 

Tidal flat 


7 


0 


0 


1 

500 Water surfaces 

510 

Water ways 

9 

2 

1 

1 

0 

0 

2 

1 


520 

Ponds and reservoirs 


6 


0 


0 


1 


530 

Estuaries 


1 


0 


0 


0 


540 

Sediment-laden water 


— 


— 


— 


— 

600 Range land 

610 

Grassland 

1 

1 

3 

1 

7 

1 

4 

1 


620 

Scattered grass 


0 


2 


6 


3 


35 



Fig. 2.8. LAND USE SKETCH MAP ORIGINALLY PREPARED AT 1:16,700 ON 
BLACK AND WHITE AIRPHOTOS. Representative of 25 such 
square or rectangular sample land use maps distributed 
over the study area. Snell town in lower left corner 
noted as land use 160 is Chung- Liao- Li. Three digit code 
numbers designate current land use/land cover (Table 2.3). 
This area is a slightly enlarged portion of the LANE6AT 
graynap shown on Figs. 2.5 and 2.6 from lines 159 to 188 
and columns 48 to 78. 
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The sketch maps do not show the originally proposed third or- 
der of detail of land use (Table 2.3). As was noted earlier, extrac- 
tion of this level of detail proved unreliable from the low altitude 
black and white photographs. It was concluded that color photographs 
of the same scale would provide reliable interpretation at this level 
of detail but they were not available for dates approximating those of 
the available LANDSAT images. These 25 sample maps do not pro- 
vide any information relative to the estimation of the amounts of each 
land use in the total study area such as was extracted from the sam- 
pled data (Table 2.5). The maps do provide detailed spatial or areal 
information for the 25 locations and can be used for subsequent 
training set development for the LANDSAT image processing activity 
and for direct visual checking of the resulting classification maps. 

2.3.3 Initial Land Use Classification System for 
Testing with LANDSAT Imagery 

Spatial resolution has a direct impact on the modification of the 
preliminary classification scheme for use with automated computer 
processing of LANDSAT imagery (Table 2.3). This usually results 
in the inability of the computer to specify the exact function of man's 
activity on the land surface as noted earlier. This means that the 
LANDSAT approach will more readily yield land cover information 
and do poorly on identifying the function which the land cover may 
represent. For example, a photointerpreter can distinguish between 
a grass strip denoting a power transmission line or a grass strip 
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along a railway. He uses the increased resolution available to him 
as well as shape information. He may see the poles of the power line 
or the rails and ties and thereby designate the specific function of the 
wider grass strip. The coarser 0.45 hectare LANDSAT resolution 
precludes this level of detail in the computer analysis of land use 
unless ancillary, i.e. non-image, data is employed in the classifi- 
cation scheme. Image processing schemes for use with LANDSAT 
imagery are currently being developed which overlay other ancillary 
information such as power line maps and road maps, etc. (Tom and 
Miller, 1976). This enables the computer to use known information 
on the distribution of known functions to further identify the activity 
which might be conducted in a 0.45 hectare cell. 

The advanced image processing schemes using ancillary map 
data were available in the computer programs used in this study but 
were not tested here. Thus, further modification of the original 
land use/land cover hierarchical scheme was necessary (Table 2.3). 
Urban classes such as industrial (130), institutional (150) and trans- 
portation and irrigation (140) refer specifically to land function and 
were removed. The urban type of strip and cluster settlement (160) 
in Taiwan is usually sparsely distributed among agricultural lands. 
The integration of the reflected solar energy in the 0.45 hectare 
resolution cell does not usually resolve such narrow, sparse urban 
land functions. Waterways (510) and ponds and reservoirs (520) 
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refer to spatial as well as functional information available to airphoto 
interpreters and are not distinguished by LANDSAT. The difference 
between clear water and sediment-laden water or clear water of 
varying depths is very distinct and categorization of water areas 
was rebased upon these characteristics. Rangelands (grasslands) 
are usually small and sparse in Taiwan (Table 2.5) and are confused 
with agricultural lands and were omitted. Land use categories were 
retained or added wherever they corresponded with a specific land 
cover type such as commercial (110). 

These considerations in light of the known capabilities of 
LANDSAT imagery yielded a revised classification scheme for a 
combination of land use and land cover (Table 2.6). This test 
scheme contained five gross categories at the first level of detail 
namely urban, agricultural, forested, barren lands and water sur- 
faces. It is subdivided into 14 more detailed second level classes. 
Subsequent testing of this land use/land cover classification scheme 
on LANDSAT imagery will result in a further modification such as 
subdivision of selected second level land cover classes into third 
level land cover classes where it was clear that LANDSAT imagery 
would support such a refinement. 
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TABLE 2.6. THE PROPOSED LAND USE CLASSIFICATION SYSTEM 
FOR TESTING ON LANDSAT IMAGERY. Modified from 
the low altitude airphoto scheme (Table 2.3). This scheme 
actually denotes land cover. Water classes may denote water 
depth or sediment concentration and may be interpreted as 
either. 


Code 

Level 1 

Code 

Level 11 

100 

Urban lands 

110 

Commercial 



120 

Mixed 

200 

Agricultural lands 

210 

Grains 



220 

Crops 



230 

Orchards 

300 

Forested lands 

310 

Hardwoods 



320 

Conifers 



330 

Mixed 

400 

Barren lands 

410 

Gravels 



420 

Tidal flat 

500 

Water surfaces 

510 

Shallow seawater 



520 

Medium seawater 



530 

Deep seawater 



540 

Fresh water 


in. DEVELOPMENT OF LANDSAT TRAINING SETS TO REPRE- 
SENT THE LAND USE/ LAND COVER OF TAIWAN 


3. 1 Methodology Used to Improve the LANDSAT Imagery 
3.1.1 Introduction 

Fourteen times a day each of the two U.S. National Aeronautics 
and Space Administration's (NASA) LANDSAT satellites orbits the 
earth collecting resources information from the surface. A brief 
introduction to the Multispectral Scanner (MSS) imaging system on 
each satellite was presented earlier (Section 2.2) and is reviewed 
and supplemented here . These MSS systems aboard the spacecraft 
convert the hue (i.e. four bands) and intensity of the reflected sun- 
light from earth below into an analog signal representing a series of 
images. These signals are stored on on-board tape recorders for 
subsequent retransmission when the satellite is within range of a 
U. S. tracking station or they are transmitted directly. Additional 
tracking and recording stations are in construction or operation in 
S. America, Africa, Australia, Japan, etc. Taiwan is within direct 
readout range of the Japanese recording station under construction 
although the images used in this study were recorded and retrans- 
mitted to a U.S. station. 
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The four overlapping, simultaneous MSS images received at the 
ground station as analog signals are recorded and processed and 
made available to potential users in a photographic form. These 
space photos each cover an area of nominally 185 km by 185 km and 
are available as black and white or color composite photographs 
ranging from scales of 1:250,000 to 1:3,369,000. This multiple date 
coverage of most of the world land area may be purchased from the 
U.S. Department of Interior, EROS Data Center, Sioux Falls, South 
Dakota which acts as the agent of public distribution of these images for 
the U. S. Government. 

The analog signals representing each MSS four band image may 
also be digitized as described earlier to provide numeric values for 
discrete pixels or ground resolution cells. The resulting computer 
compatible tapes (CCTs) may thus be obtained from the EROS Data 
Center for any imagery which has been recorded (Appendix B). How- 
ever, before this digital imagery can be used it must be corrected or 
preprocessed to remove as many of the systematic errors as possible 
such as those geometric errors caused by the rotation of the earth 
and motion of the satellite. There are also several kinds of non- 
systematic errors or noise involved in the images which may at first 
appear to discourage their quantitative analysis. The effects of 
spatially varying clouds, haze and other atmospheric constituents on 
the propagation of the electromagnetic energy from sun to ground and 
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ground to the satellite produce non-systematic noise within the 
individual LANDSAT image. The variations in signal caused by the 
surface features sought such as topography, soil types, vegetation 
changes, etc. , also provide a spatially varying component in each of 
the four different MSS bands constituting one LANDSAT scene. The 
MSS sensing system also adds additional systematic and non- 
systematic noise to the data; for instance, the six -line problem 
caused by the imbalance in the calibration of the six sensors used in 
a swath for a given band. 

The task presented by this imagery is thus much like that in 
cryptography- -to break the code and extract the desired information 
from the available signal. Surprisingly, although many competing 
factors affect the recorded image on the CCT, it is still possible by 
appropriate simplification and calibration to extract very quantitative 
information about surface features. 

The tests completed and discussed in detail in the balance of 
this section review the methods used for removal of several of the 
systematic errors and for minimizing the impact of undesirable noise. 
These include geometric rectification of the images and ratioing 
between various two combinations of the four of MSS bands. Training 
sets or sample data were developed to statistically represent each of 
the desired Taiwan land uses and land covers. Three different pro- 
cedures for assembling this training data were tested. These sets of 
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training data were statistically cleaned to remove noise, i.e. , other 
surface material types, which might have been inadvertently included 
when selecting representative samples. Finally, the image processing 

algorithms were tested on the three different sets of training data. 

These tests determined which combination of training data and algorithm 

would produce the most suitable land use/land cover maps. 

3.1.2 Geometric Correction Applied 

The systematic geometric corrections such as scaling and skew 
that could be predicted reasonably well were performed without direct 
use of ground geometric control points. More advanced LANDSAT 
geometric rectification procedures necessitate supplying a collection 
of known ground positions which must be located to + 1 cell in the un- 
corrected image graymaps. This approach may work well in coun- 
tries with extensively developed large scale road and other trans- 
portation nets with very rectangular agricultural cropping patterns to 
supply geometric control points which can readily be located in the 
unrectified graymaps. Obtaining such a collection of geometric 
ground control points is not nearly so reliable in the many countries 
dominated by smaller scale, irregularly laid out agricultural and 
transportation systems. The system employed here worked very 
well on a map by map basis without the direct incorporation of any 
geometric ground control points into the rectification process. The 
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correction consists of applying five linear transformations which act 
on the entire image without direct reference to ground control. 

The LANDSAT image consists of discrete samples of reflected 
solar energy over a two-dimensional image space. The image can be 
thought of as a three-dimensional array P(i, j, k) where i are the 
rows or lines of image cells, j are the columns of image cells across 
the scene and k are the four MSS spectral bands. The data values 
for each pixel are non-negative integers having values between 0 and 
127. The four MSS bands are assumed to be in perfect registration 
so that the problem can be studied as a two-dimensional, single band 
image problem. Linear transformation of elements of the original 
unrectified two-dimensional image space into another more geomet- 
rically correct two-dimensional space is accomplished by the simul- 
taneous application of the following matrices as linear transformations. 

Y = AX 


Y l~ 

X = 

" x l" 

A = 

~ A n 

A 12 _ 

_ Y 2 


s 


_ A 21 

A 22. 


* 

Data values range from 0 to 127 in MSS band 4, 5 and 6, while 
they range from 0 to 63 in MSS band 7 because of the differences in 
dynamic ranges of the sensors. 
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The application of these linear transformations can be depicted 
(Fig. 3.1). The nodes of the X grid represents original LANDSAT 
numeric samples of pixels of reflected energy from discrete resolu- 
tion cells on the earth. The desired geometrically rectified sample 
cells are represented by the Y grid. The new samples are oriented 
in a rescaled, rotated, and deskewed coordinate system. The 
geometric correction process assigns reflected energy values to 
nodes (or cells) in the new Y grid using the pixel values available 
from the original LANDSAT data on the X grid. The linear trans- 
formations A denote matrices representing the systematic adjust- 
ments for 1. scale change, 2. rotation, 3. skew due to the earth 
rotation, and 4. output scale factor. Correction for the non-linear, 
sinusoidal variation in the oscillation rate of the scan mirror are also 
applied. 

Application of the total geometric transformation to an input 
image requires new samples on the new Y grid between existing 
samples on the input X grid where there is no sample value. Thus, 
some interpolation scheme is required to resample points if an 
uniform completely filled output grid is desired. The resampling 
technique used was the nearest -neighbor assignment, in which the 
value of the closest input sample on the X grid is assigned to the 
sample point on the output Y grid. The average position error intro- 
duced by this geometric transformation of LANDSAT data using the 
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O Original LANDSAT Data X Grid 
A New, Transformed Y Grid 

Fig. 3.1. RELATIONSHIP OF ORIGINAL AND TRANSFORMED 
LANDSAT IMAGE CELLS. The new or output grid 
represents a clockwise rotation and rescaling of the 
original input grid, is the total Euclidian Error 
Distance introduced by the resampling technique 
(after P. E. Anuta, 1973). 


nearest-neighbor assignment is about 20 meters or 66 feet (Anuta, 
1973). This error is only slightly more than the 50 feet tolerance 
for 1:24,000 scale topographic maps generated by the U.S. Geologi- 
cal Survey. The tolerance of this 1:24,000 map is presumed similar 
to or better than that of the 1:50,000 scale topographic maps of 
Taiwan. 

The geometric correction procedure outlined above was applied 
to the available LANDSAT imagery of the study area. The actual 
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computer program applied to compute the transformation was part of 
Colorado State University package of computer programs entitled the 
LANDSAT Mapping System or LMS for short (Appendix A). The 
computer line printer provided the most economic display device for 
reproducing and checking the geometrically corrected four band 
underlay of the three 1:25,000 enlarged, transparent topographic 
maps. Thus the line printer was used to output symbolic graymaps 
of each image band as described earlier and the land use/land cover 
classification maps to be subsequently produced. This line printer 
produces 8 lines per inch and 10 symbols per inch along the line. 
Thus the output grid from geometric adjustment must be rectangular 
in the ratio of 8 to 10 and scaled so that the line printer graymap is 
printed at 1:25,000. The interaction of the output grid (Y grid) of 
these dimensions with the input grid (X grid) of the LANDSAT pixels 
is quite good. The output sample cell size to be displayed on the line 
printer by one symbol at 1;25,000 scale represents nominally 79 m 
N-S and 64 m E-W. The original LANDSAT pixel has already been 

o 

shown to be a rectangle of nominally 79 tn by 57 m inclined about 12 
to the east of north by the inclination of the orbit. The application of 
the nearest -neighbor resampling to original inclined LANDSAT grid 
by the N-S and E-W output grid is quite satisfactory due to the 
similar size and shape of input and output cells. Should a more 
varied transformation be undertaken, e. g. to match some other map 
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scale or to a square cell grid, a more significant mismatch would 
occur resulting in significant oversampling or under sampling. The 
simple nearest-neighbor resampling for 8 by 10 line printer display 
of 1:24,000 to 1:25,000 scale is about optimal when using this pro- 
cedure to rectify LANDSAT imagery (Fig. 3.2). At the 1:25,000 
scale 87.8% of the original LANDSAT cells are sampled once, 1.4% 
are sampled more than once, and 10. 8% are not sampled (Miller, 
1975). 


1:20000 1:40000 1:60000 



Fig. 3.2. RESAMPLING EFFICIENCIES OF THE GEOMETRIC 
ADJUSTMENT. Application of the nearest-neighbor 
approach in the resampling at various map scales 
transfers percentages of the samples shown from the 
input grid (X grid) to output grid (Y grid). The curves 
apply to maps resampled in the ratio 8 N-S to 10 E-W 
for display at the scales shown on the 8 line /inch 
printer (Miller, 1975). 
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Four new revised LANDSAT images are produced upon com- 
pletion of this operation from the four original MSS LANDSAT bands. 
Each new set of four resampled images is deliberately prepared for 
an area slightly greater than each of the three respective 1:25,000 
topographic maps. A graymap at the scale of 1:25,000 is printed on 
the line printer for one of the resulting new four band image files as 
they will now be called. The respective topographic map in the form 
of a 1:25,000 transparency is overlaid upon this line printer map and 
translated N-S and E-W until an accurate match is obtained between 
the topographic map and the graymap features related to the topog- 
raphy. This introduces geometric ground control which does not 
require the identification of specific control points on the graymap. 

It is a regional overall fitting of the two maps of the same scale and 
geometry. Once the best fit has been selected the excess or boundary 
cells in the graymap are trimmed off by the computer so that the 
resulting image file exactly matches the map area on the respective 
topographic map (Appendix A). One four band image file is produced 
in this fashion to match each of the three topographic maps (Fig. 2.1). 
These three small image files contain all the image cells dealt with 
in the balance of this study and are much smaller than the original 
185 km by 185 km total image (Fig. 2.4). 
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3.1.3 Ratioing MSS Bands 

Ratioing has been proposed by a number of experimenters as a 
means of reducing non-systematic errors within a multispectral 
image. Ratioing is simply dividing the reflected solar energy or 
radiance recorded in one MSS band by that of another on a cell by 
cell basis. Similar surface cover materials may have been recorded 
at different radiance values in a given spectral band because they 
occur on varying topography (i. e. differing solar lighting conditions), 
in areas of spatially varying atmospheric effects and so on. Should 
these perturbing effects be multiplicative in the same amount for the 
two spectral bands, the ratio of the two spectral bands will cancel the 
effect as it multiplies both numerator and denominator. 

A ratio of the near infrared and chlorophyll absorption bands is 
well correlated with the amount of functioning green biomass on the 
ground surface in grassland areas (Pearson, Tucker and Miller, 
1976). The ratio of MSS bands 7/5 might be an important variable 
for surface biomass classification as MSS band 5 (0.6 to 0.7 micro- 
meters) contains the region of highest chlorophyll absorption and 
MSS band 7 (0. 8 to 1.1 micrometers) is a spectral band characterized 
by high levels of reflectance for green vegetation (Maxwell, 1974). 
Also, since MSS band 4 (0.5 to 0.6 micrometers) does not contain the 
center of either of the two chlorophyll absorption bands, the ratio 5/4 
might also be an important derived image. An advantage which 
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adjacent ratios should give for vegetation classification is an im- 
proved signal to noise ratio (Maxwell, 1976). 

MSS ratios have been shown to be effective for quantitative 
mapping of suspended solids in water of up to at least 900 ppm. Typi- 
cal mid -continent values for variables such as sun angle and wind 
speed do not significantly affect MSS ratios for this application 
(Yarger and McCauley, 1975). 

Each of the three topographic map oriented image files created 
earlier contain the four MSS bands in the form of four radiance values 
for each cell. These four bands are designated 4, 5, 6 and 7. Twelve 
ratios can be computed for the four bands taken two at a time. One 
half or six of these ratios will be the inverse of the remaining six. 

The spatial variation in the ratio of two spectral bands is just the 
same as in the ratio of the inverse of the two bands except in an in- 
verse sense. Thus, no unique differences are available in the in- 
verse ratios and they were omitted. Six ratios between the four 
original MSS bands were thus computed and interspliced back into the 
four band image file using the LMS programs (Appendix A) . Each cell in 
this 10 band/ratio image file is represented by 10 values, one for MSS 
bands 4, 5, 6, and 7 and ratios 5/4, 6/4, 7/4, 6/5, 7/5, and 7/6. 
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3.2 Selection and Evaluation of Training Sets 
3.2.1 Introduction to Computer Image Classification 

A land use/ land cover map is prepared by computer classifica- 
tion using a process which recognizes groups of image cells or 
classes whose members have selected multispectral characteristics 
in common. This is a statistical process which may be implemented 
on a digital or analog computer. Ideally, these classes or groups of 
cells should be mutually exclusive and exhaustive. This states in a 
statistical sense that there should be one and only one class to which 
a cell belongs and can be assigned and all cells in the domain of 
interest may be so assigned to one of the classes based on its multi- 
spectral characteristics. These rigorous requirements are difficult 
to fulfill and often are not totally achieved in practice. The land use/ 
land cover classes or groups of cells sought in this application are 
based on the 10 band/ ratio multispectral properties possessed by 
the cells in the image files. A class is formed by grouping together 
a small, representative number of those cells in the image files that 
are alike and represent a known, selected land use or land cover. 
Likeness of the cells assembled together to represent one class is 
specified by statistical similarity in the radiance values recorded for 
those cells for one or more of the MSS bands/ ratios . Optimum 
classification will group image cells together into classes which are 
separated from one another in one or more MSS bands/ratios by 
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discontinuities in the ranges of their observed radiances (Siegal, 

1976 ). 

There are two basically different, general approaches to classi- 
fication mapping with LANDSAT images. The classification can be 
M unsupervised M in which the boundaries between land cover types are 
objectively determined from a computer algorithm to delineate natural 
clusters in a spectral sense. The "supervised" approach, on the 
other hand, uses training areas of sample cells selected to represent 
each class by the human analyst. Supervised classification requires 
each training area or group of image cells to be representative of a 
specific land cover of interest based upon "a priori" knowledge re- 
ferred to as ground control or "ground truth" data. "A priori" 
ground control or "ground truth" information may be collected on the 
ground, with airphotos or, more logically, a combination of both. 
Statistics such as mean and variance are computed for all selected 
cells for each class and spectral band. These statistical representa- 
tions of each land use/ land cover are used "to train" various auto- 
mated techniques to identify all other unknown cells within the 
LANDSAT image file which have statistically similar multispectral 
characteristics. The supervised approach is the only approach 
tested in this study. However, the unsupervised approach is very 
useful and should not be overlooked, especially when dealing with 
areas where ground control data is non-existent or difficult to obtain. 
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The supervised classification scheme requires that at least 
one training set or group of image cells be selected to define each 
land use /land cover class or theme. These training sets should 
be representative of each of the land use/land cover classes to be 
investigated. They in turn constitute a small subset of the original 
image file in n-dimens ional multispectral space, where n = 10 and 
each dimension is a spectral band or a ratio of bands. The classifi- 
cation scheme tested here is discriminant analysis which uses the 
selected training sets to define "volumes" in this n space. Each of the 
remaining cells in the image file which are not part of the training 
sets may subsequently be checked to see which of these n-dimensional 
volumes it best fits in a statistical sense thus defining its unknown 
land cover. A more technical, mathematical expression of this 
approach has been included (Appendix B). 

3.2.2 Factors Affecting Selection 

The previous discussion shows that the selection of the training 
sets to represent each land use/land cover class is the most impor- 
tant part of this computer analysis of LANDSAT imagery. These 
training sets must be a collection of sample cells which is repre- 
sentative of the total population of the land use/land cover class in 
the related image files. The quality of the final classification map 
for each land use/land cover class depends to a large part on how 
well the training set represents it. 
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The population of a well defined land use/land cover class 
should approximate a multivariate, normal distribution. A random 
sampling method might be employed to get an unbiased training set 
which is representative of each specific class. This sampling pro- 
cedure will be tested here although it incorporates difficulties in 
economy and timeliness. 

The size of training set representing each land use/land cover 
class is also a critical point. It may appear that the bigger the 
sample the more representative and better the training set will be. 
The subsequent development of the statistical representation of a 
land use/land cover class is usually an iterative procedure and if 
sample size is big the cost is consequently high. Also, the larger 
the sample the greater the risk of including cells which are not re- 
lated to the land cover sought. However, the minimum number of 
sample cells should be at least greater than the number of spectral 
bands and ratios should it be necessary to invert the covariance 
matrices to obtain the discriminant functions. Finally, it would not 
be surprising if several times that minimum number of samples was 
needed to smooth out statistical fluctuations and obtain a really good 
estimate of the population (Duda and Hart, 1973). As a rule of 
thumb, 30 times the number of spectral bands/ratios is a reasonable 
lower limit on the size of training set for a given class (Smith, 1976). 
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Three major problems were encountered in this study during 
the selection of the training sets. They were (1) temporal incon- 
sistency between the date of the collection of the LANDSAT image 
and ground control data, (2) misregistration of the ground control 
data, and (3) lack of rapid, direct communication between this 
study and those responsible for collecting the ground control data. 

(1) Temporal inconsistency between the images and ground 
truth was inevitable in this application and many others. The only 
available LANDSAT imagery was taken on November 1, 1972. The 
ground control information available for this study included three 
1:50,000 topographic maps published in 1970 and a collection of 
1:16,700 B & W airphotos taken on various dates one or two years 
after the LANDSAT image. 

(2) Misregistration between the ground control cells and 
image cells was particularly critical in the test of the grid cell 
sampling method. Specific geometric control points were not avail- 
able for the geometric corrections applied. The average position 
error which resulted was + 1 or 2 cells. Misregistration thus 
occurs while transferring the ground control cells from graymap to 
airphoto for identification. The error in its final location could 

* 

LANDSAT geometric rectification programs are currently 
available using ground control points which achieve accuracies of 
RMS = + 0.5 cells for the entire LANDSAT image. 
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easily be on the average of one cell. This one cell displacement 
problem may not be so serious in areas of large scale homogeneous 
land uses. But, it may markedly effect the representative nature of 
training sets collected for "noisy" or small scale land use patterns 
as in Taiwan. 

(3) Lack of rapid, direct communications between the collector 
of the ground control and those performing the image analysis in the 
U.S. was caused mainly by the large distance between the two coun- 
tries. It takes at least two weeks to have a two-way exchange by air 
mail. Better communications between these two functions would yield 
better training sets. 

3.2.3 Statistical Cleaning Applied 

A training set is usually obtained by selecting one or more 
rectangular or irregular bounded groups of cells within a larger 
region previously identified on the ground or with airphotos as repre- 
senting the desired land cover class. A training set can also be 
assembled from a sampled group of discrete cells which have been 
previously identified as representing the desired class. The first or 
area method overlooks the possibility that some of the individual cells 
within the specified training sets may not be of the desired class or 
may be excessively noisy. The second or point method is very sensi- 
tive to mis s -selected points due to the misregistration of the point 
ground control data on the graymaps. Statistical cleaning of training 
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sets has been proposed as a method to reduce the noise incorporated 
into the training sets by these and other related errors (Maxwell, 
1976). However, one might argue that the class heterogeneousness 
is really not noise but an integral part of the land use/ land cover 
class. Thus, if a considerable percentage of the cells representing 
a class were removed from its training sets, the remaining cells 
might be too specific to represent the real, diverse nature of that 
land use/land cover class. The procedure tested and described be- 
low uses as a rule-of-thumb that no more than 20% of the cells 
representing a class will be removed in a given iteration. 

The statistical cleaning was accomplished iteratively by com- 
puting the mean vector and covariance matrix, the spectral signa- 
tures, for each class based on the original, unaltered training sets. 
Then the "posteriori" probabilities were computed for the possibility 
that each cell in each training set belonged to each land use/land 
cover class. Cells were deleted from a given training set if they had 
a low probability of belonging to the class which they were originally 
selected to represent and/or a high probability of belonging to one of 
the other classes. Proceeding iteratively, a new mean and covari- 
ance matrix was computed for the cells which remained to represent 
each class (> 80%) and additional cells deleted by the same criterion. 
Usually two or three iterations were enough to provide adequate 
cleaning which was indicated by high "posteriori" probabilities for 
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the remaining cells (Maxwell, 1976). Placing a limit on removing 
no more than 20% of the cells in a training set for any given iteration 
"points" the training set toward the numerically dominant land use/ 
land cover in the set. 

The impact of statistical cleaning on the classification results 
for three methods of selecting training sets was carefully investi- 
gated. The discriminant functions were computed from and applied 
back to the cells of the training sets as an indication of their accuracy 
to discriminate or map the unknown cells. Deleting those cells with 
low probability of belonging to the training set representing the class 
was tested to determine if the remaining cells yielded a "better" 
training set. A new discriminant analysis and cleaning activity is 
iteratively performed with the remaining points as noted above. A 
measure of the ability of the modified training sets to represent dis- 
crete, mappable land use/land cover types can be obtained after each 
cleaning iteration. Apparent training set accuracy provides a 
measure of the total number of the cells in a given class(es) which are 
actually assigned to the correct class(es) by the discriminant function 
at that iteration or cleaning level. The cells which are correctly 
assigned to the proper class(es) are divided by the number of cells 
input to that step representing the class (es) and multiplied by 100 to 
yield this measure of accuracy in percent. Thus at any level of 


cleaning the 
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„ Correctly Classified Cells 

Apparent Traimng Set Accuracy = 100 X Cells In ' ut to Form the • 

Discriminant Function 

The numerator of this fraction can be expected to hold reasonably 
well while the denominator decreases as the number of cells repre- 
senting a class(es) decreases with successive cleanings. Thus, the 
apparent training set accuracy increases as cells are cleaned or 
removed from the original training sets. 

Actual training set accuracy is computed by dividing the cells 
which are correctly assigned to a class(es) at any level of cleaning by 
the original number of cells selected to represent that class before 
any statistical cleaning is applied. The fraction obtained is multi- 
plied by 100 to convert it to percent as 

Actual Training Set Accuracy = 100 X Qriginal 'cells selected to • 

Represent the Class(es) 

The denominator of this fraction is fixed at the original number of 
cells for each successive cleaning of a class(es) while the numerator 
fluctuates to indicate the actual impact of the cleaning procedures on 
the classification matricies. Statistical cleaning is designed to re- 
move those cells erroneously included in the group of cells selected 
to represent a class. The revised classification matricies computed 
after each cleaning are applied to all the cells originally selected to 
represent the class. The revised matricies have been improved by 
the cleaning process if some of the members of the original group of 
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cells which were incorrectly classified are now "pulled back" or 
correctly classified. 

Further understanding of these two measures of evaluating 
statistical cleaning may be achieved by example. A specific agri- 
cultural field is selected as a training set to represent a given agri- 
cultural crop cover class. The field contains areas of good homo- 
geneous crop canopy and small areas of trees and areas of the crop 
mixed with weeds. Statistical cleaning may remove the cells repre- 
senting trees as they have low probability of being the crop and high 
probability of belonging to another class representing trees. A por- 
tion of the cells representing the weed/crop mix would have been 
incorrectly classified before cleaning. Statistical cleaning is applied 
to improve the classification matricies for the crop by removing the 
tree cells and the value of this is measured by what happens to the 
weed/ crop mixed cells. Pulling them back into the crop class may 
provide the best and appropriate map of the distribution of this crop 
type. This may be evaluated by examining the actual training set 
accuracy at each successive level of cleaning where the numerator 
should increase as error or tree cells are removed from the training 
set for the given class. 

Apparent and actual training set accuracy is computed and 
examined for each of the three approaches used to compose the 
training sets. At the outset it should be clearly understood that if the 
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training sets are roughly picked by an inexperienced user a meaning- 
ful increase in actual training set accuracy might accompany the 
statistical cleaning. Training sets which are carefully selected to 
represent each desired class may show little improvement with 
successive cleaning. 

The final determination of the impact of this statistical cleaning 
must be made by consideration of what it does to improve the accuracy 
of the final, total classification map. Examination of the training set 
accuracies computed in this effort only hint at the impact of the pro- 
cedure on the actual map production. 

3.2.4 Non -Supervised Method 

The initial training set selection was completed without benefit 
of airphoto or other direct ground control information. The only 
information available was the 1:25,000 graymaps of MSS band 5 and 7, 
the 1:25,000 topographic overlay, and the knowledge of the test area 
possessed by four Taiwan resource specialists present in the U.S. 

The supervised method of collecting training sets has come to imply 
that specific ground control information was used as a basis for 
training set selection. Unsupervised image classification is a quite 
different analysis procedure employed when no ground control data is 
known and no training sets are to be employed. The procedure evalu- 
ated here used the supervised approach without benefit of ground 
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control and is termed "non-supervised" to avoid confusion with either 
of these two accepted procedures. 

The graymaps of MSS band 5 and 7 were carefully examined. 

Homogeneous areas in gray tones were visually selected by examining 

both graymaps simultaneously. Eighteen potentially separable, 

homogeneous land use/land cover classes were selected and their 

estimated land use/land cover or water type class assigned based 

upon the 1:25,000 topographic map overlay and the judgment of the 

panel of four Taiwan resource specialists. This procedure may be 

graphically represented as a plot in two dimensions with each axis 

showing the magnitude of the radiance values recorded for each cell 

in MSS band 5 and 7. This plot is referred to as two dimensional 

spectral space and can be used to visually estimate the separability 

* 

of each potential land use/land cover class. The cell values in 
area A range from 35 to 45 in Band 5 and from 7 3 to 255 in Band 7 
(Fig. 3. 3). The cell values in area B range from 43 to 255 in Band 5 
and from 59 to 75 in Band 7 (Fig. 3. 3). Cell values in area C range 
from 0 to 36 in Band 5 and from 0 to 7 5 in Band 7. MSS band 4 is 
reasonably similar to Band 5 and 6 is similar to 7 thus these three 
ground areas represent three different surface materials which may 

$ 

The cell values were multiplied by 2 for MSS band 4, 5 and 6, 
by 4 for band 7 in order to increase the data range to an uniform 0 to 
255 for use in the LMS software package. 
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Cell Values (MSS Band 5) 

Fig. 3. 3. SIMPLE SEPARATION OF CLASSES A, B AND C IN TWO 
DIMENSIONAL SPECTRAL SPACE. This simple defini- 
tion is often referred to level slicing. 

be mapped reasonably well in the larger four spectral space repre- 
sented by all four MSS bands. Training sets were thus selected to 
represent 18 land use/land cover classes using only the hierarchical 
classification scheme (Table 2.5), graymaps, topographic informa- 
tion, and available knowledge of the area. 

The procedure yielded five classes of grains representing 
distinctly different fields of rice probably in different stages of 
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growth. Since no ground control was collected at the time of the 
LANDSAT imaging, it was not possible to place specific names on 
these and other individual agricultural classes. At this point they may 

only be identified as distinct, mappable classes. Offshore water 
classes were designated by selecting training set rectangles in the 

homogeneous areas of progressively deeper water identified on the 

topographic map. Less than 10 meters was designated shallow seawater, 

10 to 20 meters as medium and more than 20 meters as deep. Consid- 
erable confusion between water depth and sediment load were thus 
possible and cannot be resolved without more known information on 
actual suspended sediment and turbidity distribution at the time of 
imaging. Confusion was encountered between the urban land use classes 
and grain classes as rice is grown in and about the urban portions of 
the test area. Specific training sets for urban classes are thus diffi- 
cult to identify and several proposed urban categories were omitted. 

The training sets selected in this fashion consisted of several 
rectangular groups of cells totaling 50 to 250 cells and representative 
of each of the 18 land cover/ water type classes sought. These col- 
lections of cells were used to compute a discriminant function which 
was then tested back upon the same cells to provide an evaluation of 
how well it can separate or map the cells from which it was prepared. 
The resultant assignment of all the known training set cells into the 
18 classes provides a training set accuracy matrix which indicates 
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how well the mapping function will perform (Table 3. 1). This 
matrix shows how each of the original training set cells in each class 
(horizontal dimension of matrix) were assigned to each class by the 

discriminant function (vertical dimension) . The cells which were 
correctly assigned occur on the diagonal - -i. e. they were selected to 

represent rice and are subsequently classified as rice. Those cells 
which were incorrectly identified or miss -classified occur off the 
diagonal. The number of cells on the diagonal for each class divided 
by the number of cells representing that class is a figure of merit 
called training set accuracy and is multiplied by 100 to obtain a per- 
centage. All the cells on the diagonal divided by all the cells in all 
the training sets provide an overall figures -of -merit or training set 
accuracy for the mapping or discriminant function being tested. The 
overall training set accuracy for this initial test of the non -supervised 
training sets using all 10 MSS bands/ratios was 68.6% and varies 
widely within the 18 classes (Table 3.1). 

Discriminant analysis can be made to proceed in a stepwise 
fashion so that each of the successive 10 MSS bands /ratios are added 
in their optimal order. This approach does not alter the final accu- 
racy achieved using all 10 MSS bands/ ratios but the approach deter- 
mines if some lesser combination of bands and ratios will achieve an 
acceptable portion of this final 10 band/ ratio accuracy. The greater 
the number of bands and ratios selected for the final discriminant 


TABLE 3.1. TRAINING SET CLASSIFICATION ACCURACY USING THE “NON-SUPERVISED” TRAINING SETS. 18 classes 
showing training set accuracy in percent using 10 channels (4 LANDSAT MSS bands and their 6 ratios). 
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Overall accuracy = 68.6% obtained by 1762 correct identifications (diagonal) divided by 2570 total samples in all training sets. 
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function the greater the cost of its application to the total image file. 
Once the order of the addition of the bands/ratios has been deter- 
mined a new training set accuracy matrix (Table 3.1) can be com- 
puted after the addition of each band or ratio in the prescribed order. 
Overall and individual class training set accuracy can thus be deter- 
mined and plotted for each band or ratio added in the stepwise fashion 
(Fig. 3.4). This graphically portrays the accuracy achieved at the 
addition of each intermediate band or ratio relative to that achieved 
by the last or 10th band/ ratio. An easy selection may thus be made 
as to the number and combination of bands/ ratios needed to achieve 
an acceptable and economic combination. 

Statistical cleaning was evaluated for use with these training 
sets. This necessitates that the two computations of training set 
accuracy described earlier be performed. It is thus possible to plot 
apparent training set accuracy for band/ ratio added as well as actual 
training set accuracy (Figs. 3.4 and 3. 5). A separate curve for each 
type of accuracy is achieved for each cleaning operation applied. 

Three successive iterations of cleaning were performed yielding four 
training set accuracy curves of each type where the 0 level of cleaning 
represents the initial case where no cells have been removed (Fig. 
3.4). The initial or 1st cleaning imposed on these training sets pro- 
vides a marked increase in apparent training set accuracy at all 
bands/ratios added and smaller increases occur for each successive 
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2nd 0_ % 4_ 5 V4 7 Vs 7/s % % 6 

3rd 0 V* 4 5 V4 7 7s 7 /fe 9^ % 6 


landsat MSS Bond or Ratio Added 

Fig. 3.4. APPARENT INCREASE IN TRAINING SET ACCURACY 
ACHIEVED AT EACH LEVEL OF STATISTICAL 
CLEANING. Eighteen classes are represented based on 
classification by the 10 MSS bands / ratios . 
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1st 0 7 4 5 V* V* Vs 7 /k 6 % 

i J. l i l J. 1 J 1 1 J 

2nd 0 % 4 5 V 4 7 Vs _J 

3rd 0 V* 4 5 5 A 7 % 7 /e Vs 6 

LANDSAT MSS Band or Ratio Added 

Fig. 3. 5. ACTUAL INCREASE IN TRAINING SET ACCURACY 
ACHIEVED AT EACH LEVEL OF STATISTICAL 
CLEANING. Eighteen classes are represented based on 
classification by the 10 MSS bands/ ratios . 
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2nd and 3rd iteration (Fig. 3.4). At the addition of the 10th band 
after three levels of cleaning an apparent overall training set accu- 
racy of approximately 95% is achieved. Examination of the same 
graphical portrayal of actual training set accuracy shows no increase 
due to this statistical cleaning procedure (Fig. 3.5). The cleaning 
procedure has successively removed cells from the training sets 
which do not appear to belong to the respective classes based upon 
their ''posteriori" probabilities. This appears to have little effect in 
"bringing back" or correctly classifying that portion of the cells which 
were not deleted by the statistical cleaning criteria but had not been 
correctly classified (Fig. 3. 5). 

Examination of the curves of actual training set accuracy 
clearly shows that most of the accuracy was achieved by the addition 
of the 4th or 5th band or ratio (Fig. 3.5). This indicates that a 
selection of four or five MSS bands and ratios would suffice without 
cleaning and produce a final classification map of 18 classes with an 
accuracy based upon a training set accuracy of 68%. 

3.2.5 Supervised Method 

The collection of specific ground control data on a grid for an 
array of 3 by 3 cells at 30 by 30 cell spacing has been described 
(Section 2. 3. 2. 1). The land use/land cover identity of each of the 
sample cells was obtained by airphoto interpretation. Those indi- 
vidual cells interpreted as containing two land use or land cover 
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types were assigned to that one of the two classes which dominated 
the remainder of the cells in its 3 by 3 array. No sample cells were 
interpreted for the offshore area of tidal flats and areas of seawater. 
Thus, the same rectangular training sets were used here for these 
water type classes as were selected for the non-supervised approach. 
The 2760 grid sampled cells were reassembled into groups of cells 
representing each of 14 land use /land cover classes. The number of 
cells to represent each class is directly proportional to the relative 
amount of that class in the study area and well represents the natural 
variability within each class, for example, only 15 cells represent 
commerical land use while 824 represent the hardwood land cover. 
Test classification proceeded exactly as outlined for the non- 
supervised approach. The overall and individual class accuracy can 
be interpreted from the 10 band/ ratio training set accuracy matrix 
(Table 3.2). An overall accuracy of only 42. 3% is achieved ranging 
from a low of 14% for conifers to a high of 96% for medium seawater. 

Three iterations of statistical cleaning were applied to these 
training sets in a stepwise fashion yielding apparent increases in 
accuracy at each level (Fig. 3.6). Examination of these four curves 
of actual training set accuracy implies that no real effect has been 
achieved (Fig. 3.7). A reasonable approximation of the accuracy 
achieved by the 10 bands/ratios occurred at the end of only two steps 
representing the addition of only ratio 6/4 and band 5. 


TABLE 3.2. TRAINING SET CLASSIFICATION ACCURACY USING THE “SUPERVISED” TRAINING SETS. 14 classes 
showing training set accuracy in percent using 10 channels (4 LANDS AT MSS bands and their 6 ratios). 
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Overall accuracy = 42.3% obtained by 1321 correct identifications (diagonal) divided by 3121 total samples in all training sets. 


Apparent Training Set Accurccy, (%) 
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let 0 ft 4 5 V4 6 Vs V 4 Vs V* 7 

2nd 0 Vs 4 5 % 6 % $4 % % 7 


3rd 0 6 4 «/ 4 5 9 / 4 «/s 7 4 7* 74 7 

landsat MSS Band or Ratio Added 

Fig. 3.6. APPARENT INCREASE IN TRAINING SET ACCURACY 
ACHIEVED AT EACH LEVEL OF STATISTICAL 
CLEANING. Fourteen classes are represented based 
on classification by the 10 MSS bands/ ratios . 
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i 1 i 1 1 > i j. i i j 

2nd 0^ 7 /S— 4 5 * 6 Vs Vs V* 7 

3rd 0 6 4 «/4 5 % Vs % Vs 7 A 7 

LANDSAT MSS Band or Ratio Added 

Fig. 3.7. ACTUAL INCREASE IN TRAINING SET ACCURACY 
ACHIEVED AT EACH LEVEL OF STATISTICAL 
CLEANING. Fourteen classes are represented based 
on classification by the 10 MSS bands/ ratios . 
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The low accuracy achieved from this approach is due to two 
reasons. (1) The location of the ground control data on the graymaps 
may be unsatisfactory. A one cell misregistration of the ground con- 
trol relative to the LANDSAT cells may well represent a different 
material. Taiwan is a country of small scale, intensive, hetero- 
geneous agricultural and other land uses. Thus, a near miss may be 
as serious as a gross misregistration. (2) The method used here 
well represents the natural variability in each land cover class. 
Multimodal distributions may result for the radiance values in a 
specific band /ratio. The reduction of the number of categories used from 
17 to 14 further increases the multimodal nature of the classes. Here 
the number of agricultural classes was reduced to three from nine in 
the non-supervised approach to match the ability of the airphoto 
interpreters relative to the available black and white photos. Thus, 
the radiance distribution for the cells in a given class may not follow 
the assumption of Gaussian distribution made in selecting the dis- 
criminant analysis technique. 

3.2.6 Pseudo -Supervised Method 

Pseudo-supervised, that is, "like"-supe rvised training data 
was developed using a combination of the available ground control 
information and careful examination of "mappable" classes by inspec- 
tion of the natural variation and homogeneousness in MSS band 5 and 
7 graymaps. First, the proposed classification scheme (Table 2.3) 
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was examined with reference to the graymaps to determine if it 
should be slightly adjusted to represent the number and type of land 
use/ land cover classes which appear mappable. Next, one or more 
irregular areas were located on the graymaps which were thought to 
represent these mappable classes and which corresponds to an area 
covered by one of the 25 ground control maps described earlier 
(Section 2. 3. 2. 2). Finally, a rectangular or irregular training set 
selection identified from the ground control map is fit into the homo- 
geneous area on the graymaps (Fig. 3.8). This process was re- 
peated to provide at least three examples of each class containing a 
total of about 100 cells except for hardwoods type B which contain 
246 cells. This procedure overcomes the registration problems of 
the grid cells approach as the final location of the training set is 
determined from the graymap while the identity of the class is taken 
wherever possible from the airphoto interpretations. Actual specific 
identity was not possible for the agricultural classes due to the mis- 
match in the dates of the LANDSAT and airphoto images. A small 
number of the classes could not be represented by reasonably sized 
rectangular training sets, e.g. the classes which represent highly 
linear or point distributed land use or land cover. These classes 
were represented by a larger number of carefully selected irregular 
shapes and collections of discrete points (e. g. urban land covers). 

No ground control maps were available for offshore areas of tidal 
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Fig. 3.8. AN EXAMPLE OF THE SELECTION OF TRAINING SETS 
BY THE "PSEUDO-SUPERVISED" APPROACH. Two 
rectangles are assigned to the classes of grains and 
gravels respectively. 


flats and areas of seawater. The training sets used here were 
assembled from a long, narrow strip of cells following a depth con- 
tour on the topographic map. 
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The pseudo -supervised training data assembled performed well 
as it represented an accumulation of knowledge and experience gained 
from the two earlier approaches. Test classification and evaluation 
proceeded as outlined in detail for the non-supervised approach. The 
overall and individual class accuracies were interpreted from the 10 
band/ratio training set accuracy matrix (Table 3.3; Appendices C and D). 

An overall 20 class accuracy of 77.6% was achieved ranging from 45% for 

one class of grain to 99% for both medium and deep seawaters. 

Lower accuracies were achieved for several of the agricultural 
classes and for the urban residential classes prompting re-examina- 
tion of the miss -classification between these classes and their re- 
vision to eliminate it. This iterative approach to the selection of the 
classes was quite in agreement with the type of training set selection 
approach employed here. The agricultural classes were revised 
down from 7 to 5 more mappable types and the urban class was com- 
bined into the mixed urban class with which it was confused. This 
reduction of the 20 initial classes to 17 improved the overall training 
set accuracy to 85% with a low of 71% for one of the crop classes 
(Table 3.4). 

Two iterations of statistical cleaning were applied to the train- 
ing sets in a stepwise fashion yielding apparent increases in accuracy 
at each level (Fig. 3.9). Examination of the three curves for actual 
training set accuracy implies that no real effect has been achieved 


TABLE 3.3. TRAINING SET CLASSIFICATION ACCURACY USING THE “PSEUDO-SUPERVISED” TRAINING SETS. 20 classes 
showing training set accuracy in percent using 10 channels (4 LANDSAT MSS bands and their 6 ratios). 


Land Use Class 


No. of 


Urban 


Agricultural 


Forested 


Barren 


Water 


Code Level I 


100 Urban 
lands 


200 Agri- 
cultural 
lands 


300 Forested 
lands 


400 


Barren 

lands 


500 Water 
surfaces 


Levels II and III 

roints 
in T.S. 

110 

120 

130 

21 1 

212 

213 

214 

221 

222 

223 

311 

312 

320 

410 

420 

430 

510 

520 

530 

540 

Commercial 

100 

90 v. 3 

6 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

Residential 

85 

15 

51 \ 24 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

6 

4 

0 

0 

0 

0 

0 

Mixed 

72 

0 

31 

65 \ 

, 3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

Grain A 

82 

0 

2 

6 

66 « 

o 

/ 

22 

1 

0 

2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Grain B 

96 

0 

0 

0 

0 

100^ 

. 0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Grain C 

91 

0 

0 

0 

23 

3 

© 

/ 

t* 

19 

9 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

Grain D 

101 

1 

2 

2 

8 

23 

1 

oc 

/ 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

Crop A 

104 

0 

0 

0 

0 

0 

12 

18 

50 

\ 20 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Crop B 

100 

0 

0 

0 

2 

0 

2 

0 

15 

77n. 0 

0 

4 

0 

0 

0 

0 

0 

0 

0 

0 

Crop C 

75 

0 

0 

0 

0 

0 

0 

31 

3 

0 

6 1 ' — , 

0 

0 

5 

0 

0 

0 

0 

0 

0 

0 

Hardwoods A 

156 

0 

0 

0 

0 

0 

0 

0 

4 

1 

0 

80- 

■o 4 

0 

0 

0 

0 

0 

0 

0 

1 

Hardwoods B 

246 

0 

0 

0 

0 

2 

0 

3 

4 

0 

0 

13 

s' 

/ 

© 

0 

0 

0 

0 

0 

0 

0 

Conifers 

93 

0 

0 

0 

0 

0 

0 

0 

3 

1 

0 

4 

10 

82 ^ 

0 

0 

0 

0 

0 

0 

0 

Gravels 

95 

0 

7 

19 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

'72. 1 

0 

0 

0 

0 

0 

Reclaimed 

70 

1 

3 

4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

9I S 

0 

0 

0 

0 

0 

Tidal flat 

96 

6 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

%4s. 

0 

0 

0 

0 

Shallow seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

4 

*96^ 

0 

0 

0 

Medium seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

99*. 0 

1 

Deep seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

99. 1 

Fresh water 

84 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

4 

1 

0 

1 

94 


00 

O 


Overall average = 77.6% obtained by 1565 correct identifications (diagonal) divided by 2016 total samples in all training sets. 


TABLE 3.4. TRAINING SET CLASSIFICATION ACCURACY USING THE “PSEUDO-SUPERVISED” TRAINING SETS. 

17 classes showing training set accuracy in percent using 10 channels (4 LANDSAT MSS bands and their 6 ratios). 



Land Use Class 

No. of 

Urban 


Agricultural 



Forested 



Barren 



Water 






Points 


















Code 

Level I 

Code 

Levels U and HI 

in T.S. 

no 

120 

211 

212 

221 

222 

223 

311 

312 

320 

410 

420 

430 

510 

520 

530 

540 

100 

Urban J 

( no 

Commercial 

100 

91 v. 8 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 


lands | 

^ 120 

Mixed 

157 

13 

73 -S 

3 

0 

0 

0 

0 

0 

0 

0 

10 

3 

0 

0 

0 

0 

0 


| 

f 211 

Grain A 

82 

0 

2 


0 

2 

4 

0 

0 

0 

0 

5 

0 

0 

0 

0 

0 

0 



212 

Grain B 

96 

0 

0 

0 

o 

/ 

o 

o 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

200 

Agricultural J 
lands y 

221 

Crop A 

104 

0 

0 

1 

3 75^20 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 



222 

Crop 13 

100 

0 

0 

2 

0 

19 

75 \ 

0 

0 

4 

0 

0 

0 

0 

0 

0 

0 

0 



l 223 

Crop C 

75 

0 

0 

0 

7 

16 

0 

7 N 

0 

7 

0 

0 

0 

0 

0 

0 

0 

0 


J 1 

f 311 

Hardwoods A 

156 

0 

0 

0 

0 

3 

1 

0 

' 8k 

\ 13 

0 

0 

0 

0 

0 

0 

0 

1 

300 

Forested J 

lands 

312 

Hardwoods B 

246 

0 

0 

0 

2 

4 

0 

0 

14 

79 ^ 

0 

0 

0 

0 

0 

0 

0 

0 



L 320 

Conifers 

93 

0 

0 

0 

1 

3 

1 

0 

4 

8 

V83 s 

0 

0 

0 

0 

0 

0 

0 


| 

f 410 

Gravels 

95 

0 

19 

1 

0 

0 

0 

0 

0 

0 

0 

*79^ 

1 

0 

0 

0 

0 

0 

400 

Barren J 

lands 

420 

Reclaimed lands 

70 

4 

4 

0 

0 

0 

0 

0 

0 

0 

0 

o 9n o 

0 

0 

0 

0 



^ 430 

Tidal flat 

96 

5 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 95 

0 

0 

0 

0 



f 510 

Shallow seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

3 


0 

0 

0 

500 

Water 

520 

Medium seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

99 

0 

1 


surfaces * 

530 

Deep seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0^ 

99 

1 



^ 540 

Fresh water 

84 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

5 

0 

0 

0^ 

95 


Overall accuracy = 85% obtained by 1551 correct identifications (diagonal) divided by 1824 total samples in all training sets. 
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1st 0 V * 7 4 5 Ti *5 6/5 6/4 6 

i i 1 1 i j. J 1 j j j 


2nd 0 7 A 7 4 5 7 A % Vs 6 

LANDSAT MSS Band or Ratio Added 

Fig. 3.9. APPARENT INCREASE IN TRAINING SET ACCURACY 
ACHIEVED AT EACH LEVEL OF STATISTICAL 
CLEANING. Seventeen classes are represented based 
on classification by the 10 MSS bands/ ratios . 
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(Fig. 3. 10). Approximately 82% actual training set accuracy was 
achieved with an optimal combination of four bands /ratios versus 
the 85% achieved with all 10 bands/ ratios. 

3.2.7 Conclusion and Selection 

Statistical cleaning was tested as a method of improving the 
representation of a class by the cells remaining in the training sets 
after cleaning. Those cells with low probability of belonging to the 
class and/or high probability of belonging to another class were 
deleted. A measure of the effectiveness of the new discriminant 
function computed after cleaning is the fate of those cells originally 
selected as part of the training set but which were neither correctly 
classified nor sufficiently different to be deleted. Cells not classi- 
fied but not yet deleted were not drawn back into the correct class 
yielding higher actual training set accuracy with successive cleaning 
iterations (Fig. 3. 11). Apparent training set accuracy will increase 
in all cases in direct linear proportion to the cells deleted (Fig. 3. 11). 
Just the opposite occurred with two of three training set selection 
approaches representing a slight decrease in actual training set accu- 
racy with each cleaning. There is a direct linear relation between 
the number of points deleted from the training sets and the apparent 
increases in accuracy. Training sets which are noisy may well be 
improved by statistical cleaning but the best way to improve low 


84 



1st 0 V* V4 7 4 5 Vi Vi Vs V» 6 

i J. i j. i i J. i x i J 

2nd 0 7« 7 4 5 V* 7* Vs «A 6 


LANDSAT MSS Band or Ratio Added 



Fig. 3. 10. ACTUAL INCREASE IN TRAINING SET ACCURACY 
ACHIEVED AT EACH LEVEL OF STATISTICAL 
CLEANING. Seventeen classes are represented based 
on classification by the 10 MSS bands / ratios . 


Training Set Samples Deleted 
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training set accuracy is to reselect better or more representative 
training sets. 



Fig. 3. 11. COMPARISON OF THE ACTUAL VERSUS THE APPAR- 
ENT TRAINING SET ACCURACY RESULTING FROM 
STATISTICAL CLEANING. All 10 bands /ratios were 
used. Note that the apparent increase in accuracy is 
directly proportional to the number of points deleted for 
each of the three approaches. 
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A final evaluation of the value of statistical cleaning must await 
a future test of its impact upon map verification accuracy. This re- 
quires that after each cleaning iteration the residual points in the 
training set be used to compute a discriminant function. These new 
discriminant functions foreach level of cleaning could be used to com- 
pute a series of land use/land cover classification maps. These test 
maps can be compared with ground control information not known or 
used in selecting the training sets. The resulting verification accu- 
racy for each successive map prepared at each cleaning iteration 
would provide a more definitive assessment of the value of statistical 
cleaning. 

Final selection of the pseudo -supervised training sets without 
statistical cleaning was obvious when the results of the three ap- 
proaches were compared at all levels of classification (Table 3.5). 
Some differences exist in the types of land cover classes selected for 
each of these tests and one to one comparisons were not possible in 
all 2nd and 3rd levels. The final choice was made based upon com- 
parison of the first order accuracy which was quite high for the 
approach selected. First order class accuracy is computed as 100 
times those cells classified into the correct 2nd and 3rd order sub- 
classes of that first order divided by the total of the original number 
of cells representing those subclasses. The training set accuracies 
achieved represent how well the training sets would work in preparing 


TABLE 3.5. COMPARATIVE TRAINING SET ACCURACY OF THREE APPROACHES TO COMPUTING TAIWAN LAND USE 
FROM LANDSAT IMAGERY. The percentages indicate the number of training set points placed in the correct class or 
combination of classes relative to the total number of points originally selected to represent the class(s). NC indicates 
“not classified.” Brackets indicate that there is not a 1 to 1 correspondence in the number of subdivisions attempted in the 
specific approach. 


300 Forested lands 


400 Barren lands 


500 Water surfaces 


220 Crops 


230 Orchards 


310 Hardwoods 


320 Conifers 


410 Gravels 
420 Reclaimed land 
430 Tidal Hat 


510 Shallow seawater 

520 Medium seawater 

530 Deep seawater 

540 Fresh water 

550 Clear water 


211 Rice A 

212 Rice B 

213 Rice C 

214 RiceD 

215 Rice E 

221 Crop A 

222 Crop B 

223 Crop C 

231 Citrus 

232 Pears 
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71 
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Type A 
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56 


94 
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69 
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70 

97 
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56 
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62 

96 

69 
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Land Use Class 


* Wo ^Supervised ' ' 
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“Supervised" 
Training Sets 

“Pseudo-Supervised ' ' 
Training Sets 
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Level 1 
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Level 11 

Code Level III 

I 11 III 
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n in 

I 

11 111 
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Urban lands 




NC 

55% 


91% 




110 

Commercial 


NC 


73% 
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120 

Residential 


NC 
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Mixed 


NC 



200 

Agricultural lands 
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69 
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210 

Grains 
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94 


89 
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NC 




100 
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NC 
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l 100 
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NC 
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NC 
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NC 


NC 
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55 



NC 



NC 

64 

69 

58 

NC 

94 

94 

NC 
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14 
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79 
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97 

99 

99 
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a first order or generalized land use/land cover map with five 
classes. The pseudo -supervised approach provides the highest 
training set accuracy for all first order classes ranging from 89% 
for barren lands to 98% for water surfaces. The overall first order 
or five class training set accuracy for the pseudo-supervised ap- 
proach is 94% while it is 79% and 72% for the non-supervised and 
supervised approaches respectively. 

3.3 Selection of Optimal MSS Bands/Ratios 

It is not economical to employ all available MSS bands and 
ratios to classify the complete maps. It is also not necessary as the 
training set accuracy approaches an upper limit after three or four 
bands have been added for agricultural and urban land use/land cover 
classification in terms of overall actual training set accuracy (Fig. 

3. 12) or first order accuracy (Table 3.6). This conclusion is sup- 
ported by other experiments employing a similar test procedure in 
connection with aircraft imagery presenting 12 spectral bands ranging 
from 0. 4 pm to 12.5 pm covering the wider range from the visible 
spectral region through the thermal infrared (Thompson et al. , 1974). 
No ratios were tested but three of the four aircraft bands selected 
either overlapped or included the four MSS bands. The study of the 
aircraft imagery also clearly illustrated the risk of extending these 
conclusions to all types of classification mapping as 8 of the 12 
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0 7 5 4 6 


LANDSAT MSS Band or Ratio Added 

Fig. 3. 12. SELECTION OF THE MINIMUM NUMBER OF MSS 

BANDS/ RATIOS FOR PREPARATION OF THE TAIWAN 
LAND USE/ LAND COVER MAPS. Seventeen classes. 
Based on actual training set accuracy using the "pseudo- 
supervised" training sets. No cleaning has been applied. 


TABLE 3.6. TRAINING SET CLASSIFICATION ACCURACY FOR THE 1st ORDER LAND USE CLASSIFICATION OF 
TAIWAN. Based on the “Pseudo-Supervised” training sets (Table 3.4). No cleaning has been applied. 



iMtid Use Class 

Maximum Achievable 
10-Band/ Ratio A ccuracy 

Optimal 4-Band/ 

Ratio Accuracy 

*A ccuracy Gain (+) 
or Loss (-) 

Original 4 MSS 
Band A ccuracy 

*A ccuracy Gain (+) 
or Loss (-) 

100 

Urban lands 

91% 

91% 

0% 

90% 

-i% 

200 

Agricultural lands 

96 

94 

-2 

89 

-7 

300 

Forested lands 

94 

91 

-3 

91 

-3 

400 

Barren lands 

89 

92 

+3 

90 

+ 1 

500 

Water surfaces 

98 

98 

0 

98 

0 


•Comparison with achievable 10-band accuracy. 
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spectral bands were required to achieve an optimal accuracy for 
classifying surficial geology classes. 

Four MSS bands/ratios provide the basis for a reasonable 17 
class land use/land cover map of Taiwan (Table 3.7). A further 
test remains as to the actual contribution of the six ratios of MSS 
bands relative to the use of only the four MSS bands. Ratios of MSS 
bands as a whole correlated well with one or both of their own numer- 
ator or denominator bands (Table 3.8) and thus may contribute little 
to the classification process selected. A stepwise discriminant 
analysis of the actual training set accuracy for the 17 classes and 
using only the four MSS band clearly shows that employing ratios will 
contribute little to the final accuracy achieved (Fig. 3. 12). The four 
MSS bands alone yield an overall training set accuracy of 79% while 
the first four MSS bands/ratios provided 81%. 

Savings can thus be achieved by omitting the step used to pre- 
pare the MSS ratios for related land cover mapping. Also, substan- 
tial additional savings can be achieved by noting that bands 5 and 7 of 
the four MSS bands provide the same overall training accuracy (79%) 
(Fig. 3.12). Thus, two of the four MSS bands employed with the 17 
class pseudo -supervised training sets provide adequate overall accu- 
racy and first order accuracies of better than 90%. 

It may appear at this point that a substantial amount of the pro- 
cedures tested have contributed little to the final processes employed. 


TABLE 3.7. FINAL TRAINING SET ACCURACY FOR THE LAND USE CLASSIFICATION MAPS OF TAIWAN. Based on the 
“Pseudo-Supervised” training sets (Table 3.5). No cleaning has been applied. 


Maximum Achievable 




l and Use Class 



1 0-Band /Ratio 

Optimal 4-Band/ 

*A ccuracy Loss 

Original 4 MSS 

*A ccuracy Loss 

Code 

Level 1 





A ccuracy 

Ratio Accuracy 

(-} or Gain {+) 

Band A ccuracy 

(-) or Gain (+) 

Code 

Level I! 

Code 

Level IU 

I II III 

I If 

III 

1 

II 

III 

i n 

III 

I 

II III 

100 

Urban lands 





91% 

91% 


0% 



90 % 


-1% 




110 

Commercial 



91% 

89% 



-2% 


92% 



+ 1% 



120 

Mixed 



73 

69 



-4 


67 



-6 

200 

Agricultural lands 





96 

94 


-2 



89 


-7 




210 

Grains 



94 

90 



-4 


93 



-1 





211 

Rice A 

87% 


78% 



-9% 


85% 


-2% 





212 

Rice B 

100 


100 



0 


98 


-2 



220 

Crops 



9i 

87 



-5 


80 



-12 


221 

Crop A 

75 

65 

-10 

56 

-19 

222 

Crop B 

75 

78 

+3 

72 

-3 

223 

Crop C 

71 

61 

-10 

60 

-11 


300 Forested lands 

310 

Hardwoods 


94 

94 


91 

91 


-3 

-3 


91 

86 


-3 

-8 



311 

Type A 



81 



67 



-14 



67 





312 

Type B 



79 



71 



-8 



68 




320 

Conifers 



83 



75 



-8 



80 



-3 

400 Barren lands 

410 

Gravels 


89 

79 


92 

83 


+3 

+4 


90 

84 


+1 

+5 


420 

Reclaimed land 



91 



94 



+3 



89 



-2 


430 

Tidal flat 



95 



93 



-2 



96 



•n 

500 Water surfaces 

510 

Shallow seawater 


98 

97 


98 

94 


0 

-3 


98 

97 


0 

0 


520 

Medium seawater 



99 



100 



+1 



93 



-6 


530 

Deep seawater 



99 



99 



0 



87 



-12 


540 

Fresh water 



95 



98 



+3 



91 



-4 

Overall accuracy 




85% 



81% 



4 



79% 



-6% 



VO 

to 


*Com pa risen with achievable 10-band accuracy. 
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TABLE 3.8. CORRELATION MATRICES WITHIN CLASSES (POOLED) FOR THE 
THREE TYPES OF TRAINING SETS. No cleaning has been applied. 

( 1 ) Non-Supervised Training Sets 


Bands/Ratios 


Ratios 

4 

5 

6 

7 

5/4 

6/4 

7/4 

6/5 

7/5 

7/6 

4 

1.00 










5 

0.72 

1.00 









6 

0.34 

0.41 

1.00 








7 

0.16 

0.20 

0.81 

1.00 







5/4 

0.20 

0.80 

0.31 

0.18 

1.00 






6/4 

-0.14 

0.01 

0.84 

0.75 

0.16 

1.00 





7/4 

-0.20 

-0.12 

0.62 

0.89 

0.04 

0.81 

1.00 




6/5 

-0.22 

-0.40 

0.58 

0.56 

-0.39 

0.80 

0.70 

1.00 



7/5 

-0.24 

-0.40 

0.42 

0.70 

-0.36 

0.65 

0.87 

0.84 

1.00 


7/6 

-0.12 

-0.08 

-0.10 

0.36 

0.01 

-0.03 

0.43 

-0.02 

0.44 

1.00 

(2) Supervised Training Sets 








Bands/ 

Ratios 





Bands/Ratios 





4 

5 

6 

7 

5/4 

6/4 

7/4 

6/5 

7/5 

7/6 

4 

1.00 










5 

0.88 

1.00 









6 

0.43 

0.39 

1.00 








7 

0.19 

0.15 

0.91 

1.00 







5/4 

0.55 

0.86 

0.32 

0.15 

1.00 






6/4 

-0.19 

-0.18 

0.78 

0.85 

-0.04 

1.00 





7/4 

-0.26 

-0.26 

0.67 

0.87 

-0.13 

0.94 

1.00 




6/5 

-0.43 

-0.57 

0.47 

0.61 

-0.53 

0.85 

0.84 

1.00 



7/5 

-0.43 

-0.54 

0.45 

0.68 

-0.48 

0.83 

0.91 

0.95 

1.00 


7/6 

-0.22 

-0.22 

0.14 

0.45 

-0.12 

0.31 

0.54 

0.31 

0.53 

1.00 

(3) Pseud o-Supervised Training Sets 







Bands/ 

Ratios 





Bands/Ratios 





4 

5 

6 

7 

5/4 

6/4 

7/4 

6/5 

7/5 

7/6 

4 

1.00 










5 

0.93 

1.00 









6 

0.72 

0.72 

1.00 








7 

0.49 

0.49 

0.87 

1.00 







5/4 

0.34 

0.64 

0.41 

0.29 

1.00 






6/4 

-0.18 

-0.12 

0.53 

0.64 

0.10 

1.00 





7/4 

-0.23 

-0.19 

0.39 

0.71 

0.01 

0.88 

1.00 




6/5 

-0.28 

-0.37 
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Hindsight does not substitute for foresight . The impact of these 
logical test procedures on the Taiwan land use/land cover mapping 
was unknown at the outset. These subsequent tests provided a logi- 
cal, scientific basis for the optimal selection of the specific training 
sets and MSS bands actually used to prepare the final land use/land 
cover classification maps. 


IV. PRODUCTION AND VERIFICATION OF LAND USE/ 
LAND COVER MAPS OF TAIWAN 


4. 1 Predictive Accuracy of Training Sets 

4.1.1 Additional Consideration of the Classification Algorithm 

Two classification algorithms were available for the production 
of the final maps. These were the maximum likelihood ratio tech- 
nique (Appendix E) and stepwise discriminant analysis which was dis- 
cussed in detail earlier (Appendix B). They are basically the same general 
approach except that stepwise discriminant analysis proceeds in a 
step by step (band by band) fashion and uses a single, common co- 
variance matrix for all classes. The maximum likelihood technique 
processes only the designated bands and uses a different, individual 
covariance matrix for each individual class sought. 

The stepwise approach has already proved valuable for exam- 
ining the various types of training data and the contribution of each 
spectral band/ ratio and establishing that the ratios of spectral bands 
add little or nothing to the map classification undertaken here 
(Fig. 3.12). Further, stepwise discriminant analysis established 
that when it was restricted to select from only the four basic MSS 
bands it achieved an equally good training set accuracy with only two 
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basic MSS bands versus a free selection from all 10 MSS bands/ 
ratios. It now remains to select from one of the two classification 
techniques and use either the discriminant analysis approach with a 
single covariance matrix or the maximum likelihood approach with 
its suite of covariance matrices. Unfortunately, the available likeli- 
hood technique cannot currently handle the six ratios of bands and 
must be restricted to the four basic MSS bands. Ratios of MSS 
bands contain considerably less variability than the four basic MSS 
bands and the covariance for some of these band ratios is very small. 
The available maximum likelihood computer programs cannot, as 
required, invert these matrices. 

At this point it was necessary to devise a method to choose 
one of these two approaches: discriminant analysis using the optimal 
subset of 10 bands/ratios determined in stepwise fashion or likeli- 
hood ratioing using an optimal combination of the four basic MSS 
bands. Also, it was important to check how well the earlier inter- 
pretations of training set accuracies extended to the actual classifi- 
cation of the maps. These tests were accomplished by preparing a 
1/25 sample map of the study area and classifying it with both tech- 
niques and the classification matrices computed from the pseudo - 
supervised training sets. The results for each classification of the 
sample map were compared to the land use/land cover results ob- 
tained from the extensive grid sampled ground control data 
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procedures outlined earlier. This grid sampled ground control was 
not employed in forming the pseudo -supervised training sets due 
principally to its sensitivity to exact registration. However, it does 
provide a very good measure of the amounts of each land use present 
on each map for these tests. A verification procedure has thus been 
designed around this known photointerpretation result to provide a 
final selection of the technique and the bands to be employed and to 
predict in advance the general accuracy of the final products. 

4.1.2 Sample Map Classification 

Cost prohibited classifying each entire map image file with 
each available technique and combination of bands/ratios as was done 
with the training data. The pseudo-supervised training sets provide 
a basis for computing a sample classification map by each technique 
to obtain a predictive measure of the expected accuracy. A system- 
atic 1/25 sample was extracted for these tests using cells at every 
fifth line and every fifth column from Taichung and Kuo-Hsing map 
image files. The Lu-Kang map image file was not sampled as two- 
thirds of it was water areas whose classification could not be veri- 
fied by the available airphoto ground control data. These miniature, 
sampled map image files each contain 5,600 cells which were classi- 
fied for comparison by both the discriminant analysis and maximum 
likelihood approaches. Using the results of the prior chapter as a 
guideline (Fig. 3.12) maximum likelihood was employed on all four MSS 
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bands and on bands 5 and 7 while discriminant analysis was employed in 

a stepwise fashion on all 10 MSS bands and ratios. The results of 
these 12 map classifications are tabulated in terms of the percent of 

the area of each map which is classified into each land use /land 
cover by each technique (Tables 4. 1 and 4. 2). 

4.1,3 Verification of Sample Maps 

Qualitative and quantitative verification techniques were ap- 
plied to examine the sample map classification results. The qualita- 
tive approach was based upon an examination of the overall appear- 
ance of the sample classification maps and tables. The spatial 
distribution of the three classes of seawater was a good indicator of 
the general predictive accuracy. The Taichung and Kuo-Hsing 
classification maps cover only land areas or fresh water, thus the 
seawater (sediment/ depth) classes should not appear on either map. 

A small area of the class of tidal flats does occur in the upper left 
corner of the Taichung map but is not confused with shallow (or 
highest sediment) seawater (Table 4.1). Essentially no sample cells 
are assigned to the seawater classes on the Taichung map by any of 
the 12 classifications tested. Proportionally more agricultural 
lands than forested lands are identified on the Taichung map of the 
lower coastal plains. Deep (or clearer) seawater is mapped at 3.6% 
on the higher elevation Kuo-Hsing map by the stepwise discriminant 
approach by the addition of the fifth band/ ratio (6/4, 5/4, 6, 4, and 5) 


TABLE 4.1. CLASSIFICATION RESULTS FOR THE TAICHUNG MAP BASED ON 5600 SAMPLED CELLS. 17 classes showing 
the relative amounts of land uses in percent using “pseudo-supervised training data.” Sampled points represent every 5th 
line and column. Stepwise discriminant analysis used for the 10 channels (4 LANDSAT MSS bands and their 6 ratios). 
Maximum likelihood ratioing technique used for MSS bands 4, 5, 6, 7 and 5, 7. 
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TABLE 4.2. CLASSIFICATION RESULTS FOR THE KUO-HSING MAP BASED ON 5600 SAMPLED CELLS. 1 7 classes showing 
the relative amounts of land uses in percent using “pseudo-superviscd training data.” Sampled points represent every 5th 
line and column. Stepwise discriminant analysis used for the 10 channels (4 LANDSAT MSS bands and their 6 ratios). 
Maximum likelihood ratioing technique used for MSS bands 4, 5, 6, 7 and 5, 7. 
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(Table 4.2). Kuo-Hsing represents an area of considerable top- 
ographic relief and yields areas of shadow which are confused with 
the deep seawater class by the discriminant analysis approach. 
There is no corresponding confusion in the Kuo-Hsing classification 
maps prepared from the two or four MSS bands by the maximum 
likelihood approach. The overall total amounts of each land use/ 
land cover mapped by either approach are approximately the same. 

The quantitative test of these results statistically compares 
the airphoto estimates of the amount of each first order land use/ 
land cover to the corresponding amounts computed for each of the 12 
classification maps. It was not possible to make this comparison 
at a second or third order of land use /land cover due to the slight 
differences at these levels between the hierarchical classification 
schemes used on the grid sampled airphotos and the pseudo- 
supervised training data. The amount of the map occupied by a 
given first order land use/land cover is computed for each sample 
classification map in percent (Tables 4. 1 and 4.2). It has also been 
estimated by the grid sample airphoto interpretations (Table 2.5). 
The LANDSAT image was obtained on November 1, 1972, while the 
airphotos were obtained in 197 3 and 1974 and thus, on the average, 
there is about a one year separation between these data. A plot can 
be prepared for each of the 12 sample image classifications com- 
paring the computed amount in percent of each first order land 
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use/ land cover against the airphoto estimates also in percent. Since 
there are five first order classes mapped for both the Kuo-Hsing and 
Taichung maps each plot contains 10 points (Fig. 4. 1). An exact 
match between the computed and estimated amounts of land use /land 
cover would place all 10 points on a 45° line representing a 1 to 1 
comparison. A test of how well these points fit the expected line 
occurs below. A second graph can be prepared which shows the 
variation in difference between the computed and estimated amounts 
of each first order land use/ land cover for each of the 12 classifica- 
tions attempted (Fig. 4.2). This graph clearly indicates that MSS 
bands 5 and 7 with the maximum likelihood approach provide the 
most accurate rendition of the amounts of each of the major land use/ 
land cover present on both maps. The difference between the com- 
puted and estimated results also decreases rapidly for the first four 
steps and then remains relatively constant. The fact that this 
measure of comparison does not continue to decrease below a fixed 
and relatively constant level implies that there is an inherent differ- 
ence between the computed and estimated results which cannot be 
further improved by the addition of more spectral bands / ratios . 

This may represent the amount of real change in land use/ land cover 
between the 1972 LANDSAT and 1973/74 airphoto dates or a system- 
atic difference or "error" in the two different approaches used to 
obtain the computed and estimated amounts of land use/land cover. 


Airphoto Estimates 
(as % of Land Use) 


103 



Classification Results - 4 MSS Bands 
(as % of Land Use) 


Fig. 4.1. COMPARISON OF THE ESTIMATED TO COMPUTED 
FIRST ORDER LAND USE/ LAND COVER. Maximum 
likelihood analysis was applied to the four MSS bands. 
Seventeen second level classes were aggregated into 
five first level classes for comparison. Airphoto esti- 
mates based on 2760 sample cells (Table 2. 5). Classi- 
fication results based on 5600 sample cells (Tables 4. 1 
and 4.2). 


Difference from Airphoto Estimates, 
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LANDSAT MSS Band or Ratio Added 


Fig. 4.2. DEVIATION OF FIRST ORDER LAND USE/ LAND COVER 
MAP CLASSIFICATION RESULTS FROM AIRPHOTO 
ESTIMATES. A negative difference represents an air- 
photo estimate greater than the classification results. 
Isolated points for 2 and 4 band cases were computed by- 
maximum likelihood technique. Remaining values con- 
nected by lines computed by stepwise discriminant analysis. 
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4.1.4 Selection of the Final Procedures 

Just as with the training set tests the combination of original 
MSS bands 5 and 7 provide the most economical and accurate results 
with the available classification algorithms (Fig. 4.2). One additional com- 
parison gives further weight to this conclusion. The comparison of 
the computed and estimated amounts of first order land use/land 
cover provide five points to approximate a 45° line, or 10 points if 
both maps are taken together (Fig. 4. 1). These test points do not 
exactly fit the expected line and the standard error of the estimate 
provides a means of computing a measure of their misfit as a group. 

This standard error can be computed and plotted for each of the 12 
sample classifications as a function of MSS band or ratio added as 
was done earlier to examine the training set accuracies (Fig. 4. 3). 

The standard error for MSS bands 5 and 7 processed by the maxi- 
mum likelihood approach is less than that for all four MSS bands. 

Further, eight bands/ratios must be used in the stepwise fashion to 
achieve the same results as with four MSS bands (Fig. 4.3). 

The verification procedures employed here are not foolproof 
but were selected to compensate for differences which evolved into 
the two hierarchical land use/land cover classification schemes. 

Certainly a one to one (cell by cell) comparison of the computed and 
known land use/land cover of a sample of map cells would be more 
rigorous. It is possible that the overall amounts of the first order 


Standard Error of Estimate 
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LANDSAT MSS Bands or Ratios Added 


Fig. 4.3. SELECTION OF OPTIMAL LANDSAT BANDS AND 

RATIOS FOR THE TAIWAN LAND USE/ LAND COVER 
CLASSIFICATION, Standard error of estimate is based 
upon the difference between airphoto estimates and 
classification results for the five aggregated first level 
classes . 
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land use /land cover can match as has been shown while their spatial distri- 
bution does not similarly correspond. The techniques used here do provide 
a good indication of the optimal bands and technique to be applied for the 
final map production. The similarity of these results to those obtained 
earlier by examination of training set accuracy provide additional con- 
fidence in these conclusions. 

4 . 2 Map Production 
4.2.1 Input 

The final land use/land cover classification maps were prepared from 
MSS bands 5 and 7 by the maximum likelihood approach. Microfilm graymaps 
of these two bands assign black to the image cells with very lew spectral 
returns and white to those with very high spectral returns (Figs. 4.4 and 
4.5). These individual graymaps of MSS bands 5 and 7 clearly shew some 
of the land uses/land covers in detail. The graymap of MSS band 5 for 
the Taichung map (Fig. 4.4b) displays the drainage pattern and urban lands 
in white and forests and river channel in black. The rest of land use/ 
land cover types are shewn in different intermediate levels of gray. The 
graymap of MSS band 7 for this same map (Fig. 4.5b) displays the drainage 
pattern and urban lands in black and most of the agricultural lands in 
white or light gray. The rough terrain on the eastern side of the Taichung 
map is emphasized with rougher topography appearing as shaded relief. A 
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(a) Lu-Kang Map (b) Taichung Map (c) Kuo-Hsing Map 


FIGURE 4.4. GRAYMAPS OF LANDSAT MSS RAND 5 FOR THE TEST SITE. Generated on 
a microfilm plotter from the corputer compatible tapes of the Novenfcer 1, 
1972 image. Scale vl: 200, 000. Ten discrete gray levels were displayed. 
The exact location of these maps is presented in Figure 2.1 (page 10). 
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(a) lu-Kang Map (b) Taichmg Map (c) Kuo-Hsing Map 
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FIGURE 4.5. GRAYMAPS OF LANDSAT MSS BAND 7 FOR THE TEST SITE. Generated on a 
microfilm plotter from the computer ocnpatible tapes of the November 1, 1972 
image. Scale ^1:200,000. Tan discrete gray levels were displayed. The 
exact location of these maps is presented in Figure 2.1 (page 10) . 

FOT T>OOT FRAMFO^ 


Ill 


ORIGINAL PAGE IS 
OF POOR QUALITY 





POLDOU’I FRAME \ 


FIGURE 4.6. OVERALL 17 LAND USE/LAND COVER CLASSIFICATION MAPS. Extracted 
by oonputer analysis of a November 1, 1972 LANDSAT image. Scale ^1:200,000. 
Specific second and third order classes are annotated an the thare maps 
which follow. Ccrpare detail with IANDSAT photo interpretation map in Fig- 
ure 2.2 (page 12) . 
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(a) Lu-Kang Map 


(b) Taichung Map 


(c) Kuo-Hsing Map 
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FIGURE 4.7. FIRST ORDER LAND USE/LAND COVER CLASSIFICATION MAPS. Extracted by 
ccrputer analysis of a November 1, 1973 IANDSAT image. Scale ^1:200,000. 
Specific first order classes are annotated on the maps. 
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order land use within that first order category (Figs. 4.8, 4.9, 4.10, 4.11, 
and 4.12). Areas of the theme displays belonging to any of the four’ re- 
maining first order classes are displayed in white. Since no first order 
category is subdivided into more than five second order classes these theme 
maps present a reasonable picture of the distribution of each land use /land 
cover at the second order. Five graylevels or less were used in the first 
order nap and theme maps and can be distinguished by the observer when the 
land use/ land cover is distributed in uniform patches. Highly variable 
spatial intermixes of cells of various land use are still difficult to dis- 
tinguish and may only be properly displayed in differing colors, if at all. 

4. 2. 2.1 Verification 

Close examination of the theme maps provides a qualitative measure 
of the accuracy of these final classification maps and the general sources 
of remaining error. Urban lands were classified along the rivers and 
coastal lines and appear as "error" in the classification maps (Fig. 4.8). 
Dry sands occur along the river or coastal embankments and possess very 
similar spectral characteristics in the two-dimensional spectral space 
to the concrete roofs of the buildings which dominate the commercial cate- 
gory of urban land use. The addition of spectral bands from LANDSAT images 
taken on some other date or the overlay of ancillary data, such as the 
distance from the center of the city to each cell processed, should decrease 
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(a) Lu-Kang Map 


(b) Taichung Map 


(c) Kuo-Hsing Map 


FIGURE 4.8. URBAN LAND USE/LAND COVER CLASSIFICATICN THEMES. Extracted by 
ccnputer analysis of a Novoiber 1, 1973 LANDSAT image. Scale vL: 200, 000. 
Specific second order classes are annotated on the maps. 
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(a) La- Kang Map 


(b) Taichung Map 


(c) Kuo-Hsing Map 
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FIGURE 4.9. AGRICULTURAL LAND USE/LAND COVER CLASSIFICATION THEMES. Extracted 
by conputer analysis of a November 1 , 1973 LANDSAT image. Scale ^1:200,000. 
Specific second and third order classes are annotated on the maps. 
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(a) Lu-Kang Map 


(b) Taichung Map 


(c) Kuo-Hsing Map 


FIGURE 4.10. FOREST LAND USE/LAND COVER CLASSIFICATION THEMES. Extracted by 
oor p uter analysis of a Noveitber 1, 1973 IANDSAT image. Scale ^1:200,000. 
Specific second and third order classes are annotated cn the maps. 
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(a) La- Kang Map 


(b) Taichung Map 


(c) Kuo-Hsing Map 
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FIGURE 4.11. BARREN LAND USE/IAND COVER CLASSIFICATION THEMES. Extracted by 
ccrputer analysis of a November 1, 1973 LANDS AT image. Scale vL: 200 ,000. 
Specific second and third order classes are annotated on the maps. 
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(b) Taichung Map 


(c) Kuo-Hsing Map 


FIGURE 4.12. WATER SURFACE CLASS IFICATIC^ THEMES. Extracted by carputer analysis 
of a Movorber 1, 1973 LANDSAT iirage. Scale ^*1:200, 000. Specific second and 
third order classes are annotated cn the naps. The zigzag diagonal lines in 
white cn the Lu-Kang Map represent missing portions of the original scan lines. 
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this kind of classification error*. Generally the distribution of forest 
types is correlated with the aspect of the terrain. Overlays of slope and 
aspect data onto the multispectral data should be incorporated into future 
analysis schemes to improve the accuracy of the categorization of forest 
types. Overlays of water surfaces of the various categories are distri- 
buted parallel 'to the coastal line but they seem to be more closely related 
to the suspended sediment content than to the water depth. Adequate ground 
control information can resolve this question and provide a basis for a 
separation of these two general water categories. Bad scan lines and the 
six-line problem due to the imbalance in the calibration of the six sensors 
in the multispectral scanner were not removed or eliminated in this study. 
The obvious error in the classification of portions of whole scan lines 
and banding occurring in water surfaces was caused by these problems. More 
advanced preprocessing and calibration techniques already demonstrated by 
others can be employed to reduce the impact of these problems and improve 
the classification results. 

4. 2.2. 2 Tabulation of land Use/Iand Gover 

The area of each of the 17 land uses/land covers in three maps was 
computed by Hie maximum likelihood ratioing technique using MSS bands 5 
and 7 (Table 4.3). One hundred forty thousand image cells of 0.45 hectares 
are contained in each of three maps, representing 63,000 hectares per map. 


TABLE 4.3. AREA OF EACH LAND USE IN HECTARES AS CLASSIFIED FROM LANDSAT IMAGERY FOR EACH 1:25,000 
MAP OF 63,000 HECTARES. 17 classes mapped using "pseudo-supervised" training data and 
the maximum likelihood ratioing technique applied to MSS hands 5 and 7. 


Land Use Class Lu- Kang Map Taichung Map Kuo-Hsing Map 


Code 

Level I 

Code 

Level II 

Code 

Level HI 

7 

II 

III 

I 

II 

III 

/ 

// ' 

III 

lop 

Urban lands 

110 

Commercial 



2,230 

1,033 


3,138 

706 


939 

145 




120 

Mixed 




1,197 



2,432 



794 


200 

Agricultural lands 

210 

Grams 

211 

Rice A 

14,660 

6,936 

6,596 

43,842 

15,051 

10,792 

17,394 

3,0111 

2,363 





212 

RiceB 



340 



4,259 



649 



220 

Crops 

221 

Crop A 


7,724 

6,105 


28,791 

22,504 


14,383 

7,925 





222 

Crop B 



340 



3,975 



5,752 





223 

Crop C 



1,279 



2,312 



706 

300 

Forested lands 

310 

Hardwoods 

311 

Type A 

800 

794 

0 

13,678 

13,243 

794 

44,100 

28,470 

15,397 





312 

Type B 



794 



12,449 



13,073 



320 

Conifers 




6 



435 



15,630 


400 

Barren lands 

410 

Gravels 



4,473 

44 


2,098 

1,556 


434 

277 




420 

Reclaimed land 




258 



246 



132 




430 

Tidal flat 




4,171 



296 



25 


500 

Water surfaces 

510 

Shallow seawater 



40,383 

11,246 


so 

0 


6 

0 




520 

Medium seawater 




12,033 



0 



0 




530 

Deep seawater 




16,437 



0 



0 




540 

Fresh water 




667 



50 



6 



Unclassified (thrcsholded out) 


454 


194 


127 


120 
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Maximum likelihood ratioing may assign any of the 420,000 cells to an ad- 
ditional 18th class when their probability of belonging to any of the 17 
classes specified by training sets is lower than a selected threshold 
value* Thus, the relationship between classification accuracies and thresh- 
old values should be further investigated and additional land uses /land 
covers will be needed to categorize those unclassified cells. 

4.2.3 Costs 

Cost is one of the major considerations in a land use/land cover 
mapping project. The cost for each of 1:25,000 land use/land cover classifi- 
cation maps (about lxl meter in dimension) was estimated based on the 
Colorado State University charge system for time on the CDC 6400 computer 
(Table 4.4). Neither the cost of development and testing of procedures 
applied to training sets and verification nor the cost of labor was included 
in this estimate. The cost for computer time only is about U.S. $265 (N.T. 
$10,070) per map, or N.T. $0.16 per hectare, using two spectral bands and 
the maximum likelihood ratioing approach. It costs about twice as much 
to use four spectral bands. Thus, the land use/land cover mapping can be 
done much less expensively by this approach than by the conventional methods. 
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TABLE 4.4. COST ESTIMATES FOR EACH 1:25,000 LAND USE/LAND COVER 
CLASSIFICATION MAP. Does not include cost of development and 
testing procedures applied to training sets or verification. Based only on 
Colorado State University charge system of $290 per hour of CDC 6400 
time and does not include labor. Based on 17 classes and 140,000 cells 
of approximately 0.45 hectare. Based on 1 U.S. $ = 38 N.T. $. 


4-Band Case 

Time in Seconds Cost 


Operations Performed 

Central Processor 

InputfOutput 

U.S. $ 

N.T. $ 

Format conversion \ 

Geometric correction ? 
Graymapping J 

1,300 

1,300 

$255 

$ 9,690 

Map classification j 

Display f 

2,400 

400 

320 

12,160 

) 

Totals 

3,700 

1,700 

$575 

$21,850 



or N.T. $ = 

0.35/hectare 


2-Band Case 






Time in Seconds 


Cost 

Operations Performed 

Central Processor 

InputfOutput 

U.S.$ 

N.T. $ 

Format conversion 'j 

Geometric correction J 
Graymapping J 

650 

650 

$125 

$ 4,750 

Map classification ) 

Display f 

1,000 

400 

140 

5,320 

/ 

Totals 

1,650 

1,050 

$265 

$10,070 


or N.T. $ = 0.1 6/hectare 



V. CONCLUSIONS 


The results of this study were based upon single date LANDSAT 
imagery and the availability of limited ground control. The land use/ 
land cover classification scheme could be revised if better ground 
control data became available and the accuracy of the classification 
maps correspondingly improved.. The testing completed to date has 
provided a logical, scientific basis for the land use/land cover 
classification mapping of Taiwan using the LANDSAT MSS imagery. 

It has covered the complete spectrum of land covers/water types 
occurring in the study site. The study site was selected as repre- 
sentative of the complete spectrum of land covers and water types of 
Taiwan and the results should be applicable to the entire island. 
Additional subdivision of some second level classes, such as the 
agricultural lands and forested lands, must be investigated in more 
detail to achieve even more meaningful subcategories of these classes. 
Offshore seawater classes were found to be more closely related to 
suspended sediment content than actual water depth but could not be 
calibrated due to the total lack of ground control. A further investi- 
gation should be undertaken of the study of coastal processes by 


LANDSAT remote sensing. 
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Three approaches for selection of training sets were tested. The 
non-supervised method was employed without reference to specific ground 
control data and classifies the first level classes with a 79% training 
set accuracy. Ground control data from black and white airphotos ex- 
tracted by point photointerpretation provided a second method for estate 
lashing the training sets. A relatively low training set classification 
accuracy was obtained when using this sampled data in a supervised 
approach due to the misregistration of the ground control data and the 
heterogeneous nature of the training sets selected. Better registration 
and reliable ground control data at third level classes should improve 
the accuracy of this approach. The pseudo-supervised approach provided 
the best training sets and the most accurate training set classification 
results of 89% for first level classes. However, this approach requires 
prior information about the natural grouping of land cover types in order 
to obtain- reasonable subdivisions of the' second level classes. Based 
upon these results the best composite approach appears to be: 

1. Use unsupervised or cluster analysis to identify and display 
the natural land cover classes which can be separated in a 
multi spectral sense. 


2 . 


Use airphotos to identify the unknown land covers. 
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3. Select specific training sets to represent these desired 
land covers and apply the supervised approach to prepare 
the final classification map. 

Statistical cleaning was proposed to increase the training set 
accuracy. The tests completed in this study have shown that statis- 
tical cleaning does not significantly improve the actual training set 
accuracy. The best way to improve this accuracy was to reselect 
improved or more representative training sets in an iterative or 
learning procedure. However, a final evaluation of the value of 
statistical cleaning remains to be tested by determining its impact 
upon final map verification accuracy. 

Costs and accuracy are the two major considerations in a land 
use/land cover mapping project. Effort must be made to achieve the 
highest accuracy at the lowest expense. The quality of training sets 
selected will directly affect-the accuracies of the classification map 
when a supervised approach is used. Once the training sets have been 
selected the classification accuracy may be improved by adding 
spectral bands /ratios. Adding these additional variables corre- 
spondingly increases the cost of computing the classification map. The 

tests made on three different training sets established that four selected 
MSS bands /ratios provided a comparable accuracy with that obtained 
by all 10 MSS bands/ratios. The ratios of the MSS bands were shown 
to contribute little additional accuracy to the training set classifications 
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performed by stepwise dis criminant analysis . Moreover, MSS bands 
5 and 7 provide the same overall training set accuracy as the four 
MSS bands/ ratios identified by the stepwise tests. The .preliminary 
verification of the final classification map provided further support 
for the selection of MSS bands 5 and 7. Thus, substantial savings 
were achieved by selecting two specific, sensitive spectral bands 
without any significant loss of final classification accuracy. 

The training sets established hy the pseudo -supervised approach 
could be applied to the entire country. The "signature" extension 
from the northern image, where the training sites occur, to the 
southern image appears feasible as they were collected only a few 
minutes apart and are adjacent on the same LANDSAT orbit. How- 
ever, this supposition should be verified. The land area of Taiwan is 
equivalent to approximately 60 of the 1:25,000 maps analyzed here. 
Preparing similar classifications for these 60 maps for the 17 land 
use/land cover classes using only MSS hands 5 and 7 would cost about 
$15,000 TJ. S. ($570,000 N.T.) in computer time (Table 4.4). This 
mapping approach could be economically completed for the whole 
island in a short period, yielding timely, up-to-date land use/land 
cover maps and area statistics. 

The final classification map in this study achieved over 89% 
training set accuracy at the first level of land use/land cover using a 
single date LANDSAT image. Significant increase in this classifica- 
tion accuracy could be achieved hy analyzing LANDSAT imagery 
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taken on different dates during a given growing season and overlaid to 
provide a basis for the simultaneous , imiltispectral , multidate processing 
already tested by others. Confusions in classes, such as urban lands and 
crops, can be further reduced by increasing the dimensions of spectral 
space by the addition of new spectral bands as is planned by NASA for 

future LANDSAT-type satellites and the thematic mapper satellites. Addi- 
tional improvements can also be achieved by the input of overlays of 
cellularized maps, e. g. , topography and other ancillary data, into 
the classification procedure. 

Proper display of the final classification map is equally as 
important as the final map accuracy. A better display of the resulting 
classification map can encourage wider usage of the approach and 
product. Computer line printer displays provide a cheap product 
which is compatible with topographic maps and can portray all the 
detailed cell -by-cell distribution of each land use/land cover at 
1:25,000. Microfilm display of specific theme maps provides a 
better overview of the spatial distribution of particular land use/land 
cover classes. Unfortunately, black and white microfilm or line 
printer displays have insufficient gray levels to effectively visually 
display more than three to five discrete classes. Display of the final 

multiclass maps in color on a computer color film generator can over- 
come many of the display handicaps encountered in this study. 
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Additional field verification of the three computed classification 
maps will be undertaken in the near future to obtain map verification 
accuracy in a more absolute sense. 'It will also provide more detailed 
information for further training set selections and the development of 
an improved land use/ land cover hierarchy. The correlation between 
multispectral clustering and actual occurrence of land use/land cover 
will also be more accurately established. 
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APPENDIX A 


LANDSAT MAPPING SYSTEM <LMS) 

The LANDSAT Mapping System or LMS is a total rewriting of the EECOG 
or RECOGnition Mapping System (Smith, Miller and Ells, 1972; Ells, Miller 
and Smith, 1972a and 1972b) . RECOG was designed principally for training 
purposes and this new IIS system is compatible with it. However, the new 
design is for specific use with LANDSAT imagery for map and composite map- 
ping system (CIS) overlay, low cost, ease in understanding, flexibility, 
export to other user computers, and high volume production (Miller, Max- 
well, and Riggs, 1977). 

This system consists of four major steps. The first step is to pre- 
pare map overlays in a desired scale by inputting LANDSAT CCDs. The 
second step is to interleave images from various dates. Multiple ancillary 
or map data planes can also be overlaid on the image cells in this step. 

The third step is to compute and optimize the statistical representation 
of the materials to be mapped. The final step is then to map the distri- 
bution of each material sought and display the classification maps as line 
printer and microfilm rectified and scaled maps. 

The computer cost was estimated for one 1:24,000 quadrangle map 
using a CDC 6400 computer and the charge system of Colorado State Univer- 


sity. 
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Step 1 . IMAGE PREPARATION/MAP OVERLAY. 



UP TO 4 TAPES, REPRESENTING ^25 MILE 
E-W SEGMENTS OF A GIVEN LANDSAT IMAGE, 
MAY BE INPUT SIMULTANEOUSLY. 


"CONVERTS" the landsat format tape(s) into the 

INTERNAL, SINGLE RECOG TAPE. ONLY THE PORTION 
OF THE IMAGE NEEDED TO OVERLAY THE SELECTED 
MAP IS CONVERTED AND POOLED TOGETHER. 


"ROTATE" RESAMPLES THE ORIGINAL IMAGE CELLS TO 
REPRESENT ANY SIZE RECTANGULAR OR SQUARE CELL 
AS SELECTED BY THE USER. ADJUSTS FOR ORIGINAL 
IMAGE DISTORTIONS. SCALES IMAGE TO MAP SCALE 
(E.G., 1:24,000). 


"FILTERS" THE image. 


'DISPLAYS" 1 , 2, 3 ... OR ALL OF THE INDIVIDUAL 
SPECTRAL BANDS IN THE ORIGINAL OR MAP OVERLAY 
FORMAT. DISPLAY OPTIONS INCLUDE LINEPRINTER 
AND MICROFILM GRAYMAPS. 



"LANDSAT" COMPUTER COMPATIBLE TAPE (CCT) AS SUPPLIED BY EROS 
DATA CENTER. 



"RECOG" FORMATTED TAPE (OR DISK) FILE - AS STANDARD FORMAT TAPE 
USED THROUGHOUT THE IMAGE PROCESSING ACTIVITY, (n = 1 to 4) 
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Step 2. INTERLEAVES IMAGES FROM VARIOUS DATES. 



UP TO 10 RECOG FORMATTED TAPES 
OF A VARYING NUMBER OF SPECTRAL 
BANDS ARE INPUT. 


"TRIMS" each recog formatted tape 

{OR FILE) TO A SELECTED NUMBER 
OF LINES AND COLUMNS DESIGNATED 
BY THE USER, USUALLY THOSE 
NEEDED TO COVER MAP SELECTED. 
LINES AND COLUMNS ARE RENUM- 
BERED, BEGINNING AT 1,1. 


"QOMBINES" Recog formatted data from the i 

TO 10 SEPARATE INPUT TAPES (FILES) INTO 
1 COMPOSITE RECOG TAPE (FILE) REPRESENT- 
ING A MULTIDATE, MULTISPECTRAL IMAGE. 


"DISPLAYS" 1, 2, 3 ... OR ALL OF THE INDI- 
VIDUAL SPECTRAL BANDS IN COMBINED IMAGE. 
DISPLAY OPTIONS INCLUDE LINEPRINTER AND 
MICROFILM GRAYMAPS. 


(i+j are any integers) 



STEP 2. AUXILIARY PROGRAMS. 


"ANCILLARY" creates recog formatted data 

FROM CELLULARIZED MAP DATA PLANES IN- 
PUT IN CARD OR MAGNETIC TAPE FORMAT. 
MAP CELLS MUST BE THE SAME SIZE OR 
SOME INTEGER MULTIPLE OF THE CELLS ON 
THE RECOG FORMATTED DATA WITH WHICH 
THE ANCILLARY DATA WILL BE COMBINED. 












Step 3. C0WTE5 STATISTICAL "SIGNATURES" OF MATERIALS TO BE MAPPED, 


r \ 
v R V 




"EXTRACTS" the training field data identi- 
fied BY THE USER (RECTANGLES, IRREGULAR 
AREAS, AND POINTS) FROM THE RECOG IMAGE 
FORMAT. 

"TWiSFORMS" the training field data, 
forms' ratios of specified spectral bands, 

USES ELEVATION OVERLAYS TO ADJUST SPEC- 
TRAL BANDS FOR TERRAIN SHADOWING, ETC. 

CLEANS" OUT TRAINING FIELD DATA POINTS WITH 
LOW PROBABILITY OF BEING THE SELECTED 
MATERIAL OR HIGH PROBABILITY OF BEING 
SOME OTHER MATERIAL, ETC. 

GROUPS" TRAINING SETS TOGETHER WHICH WERE 
ORIGINALLY SELECTED IN EXTRACT TO RE- 
PRESENT SEPARATE MATERIALS BUT ARE NOW 
DETERMINED TO BE STATISTICALLY SIMILAR. 

CLASSIFIES" the training fields using 

MAXIMUM-LIKELIHOOD APPROACH (STEPWISE 
DISCRIMINANT analysis), other decision 
RULES CAN BE SUBSTITUTED HERE. 

OVERLAYS" any variable 

OR RESULT IN POINT FILE 
INTO A RECOG FORMAT FOR 
DISPLAY AND MAP OVER- 
LAY. 

I "SIGNATURES" computes 

‘ STATISTICAL REPRESENTA- 
TION OF EACH MATERIAL 
SPECIFIED BY THE USER 
FOR USE IN MAPPING THESE 
MATERIALS ON ANY DATA 
TAKEN FROM THE SAME 
ORIGINAL IMAGE. 

PRINTS" OR "PUNCHES" out 

ANY VARIABLE (S) IN THE 
POINT FILE FOR FURTHER 
ANALYSIS IN ADDITIONAL 
PROGRAMS WRITTEN BY THE 
USER. 


"POINT" BY POINT TAPE (OR DISK) FILE. - AN INTERNAL TAPE, DISK, AND/OR 
CARD FILE FORMAT WHICH CONTAINS ONLY THE EXTRACTED TRAINING FIELD DATA 
AND DOES NOT MAINTAIN ITS CORRECT MAP OVERLAY POSITION. 
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Step 4, [WS DISTRIBUTION OF EACH MATERIAL, 



(i and k are any integers) 


'TRANSFORMS" data for each image cell as 

TESTED AND SELECTED IN STEP 3. 


"MAPS" OUT THE DISTRIBUTION OF EACH SURFACE 
MATERIAL SPECIFIED BY THE USER. 


"DISPLAYS" THE SELECTED IDENTIFICATION OF 
EACH IMAGE CELL AND/OR PROBABILITY THAT 
IT IS THE MATERIAL DESIGNATED. DISPLAY 
OPTIONS INCLUDE LINEPRINTER AND MICROFILM 
GRAYMAPS AND LINEPRINTER COLOR SYMBOL MAPS. 


STEP 4. AUXILIARY PROGRAM. 


"ZOOMS" OR ENLARGES THE RECOG FOR- 
MATTED TAPE (OR FILE) BY ECHOING 
EACH IMAGE CELL 11 N" TIMES ON A 
LINE AND' REPEATING EACH LINE "M" 
TIMES . 
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LANDSAT MAPPING SYSTEM (.LMS) 


ITEM DEVELOPMENT STATUS COST ESTIMATE* 


CONVERT 

100% 


$5/date 

ROTATE 

100% 


$7/date 

FILTER 

100% 


$6/date 

DISPLAY 

100% 

$1 /band/date x 2 bands = 

$2/date 

STEP 1 

100% 


$20/date 

Assuming 3 dates involved gives $20/date x 3 = 

$60 

TRIM 

100% 


$3/date 

COMBINE 

100% 

3 dates combined = 

$1 

DISPLAY 

100% 

$1 /band/ date x 1 band = 

$1 

' ANCILLARY 

100% 


optional 


STEP 2 '100% 

Assuming 3 dates gives $3/date x 3 dates + $1 + $1 = $11 


EXTRACT 

100% 

$10 (approx.) 

TRANSFORM 

100% 

$5 (approx.) 

CLEAN 

100% $2/iteration x 3 iterations - 

$6 (approx.) 

CLASSIFY 

100% $8/iteration x 3 iterations = 

$24 (approx.) 

SIGNATURES 

100% 

$2 (approx.) 

OVERLAY 

90% 

optional 

GROUP 

90% 

optional 

PRINT/PUNCH 

95% 

optional 

STEP 3 

98% 


Based on 

2,000 points = 

$50 (approx.) 

TRANSFORM 

0% 

$5 (approx.) 

MAP 

100%** based on mapping 30 



material types 

$73 (approx.) 

DISPLAY 

100%** black-and-white lineprinter 



symbol map 

$1 (approx.) 

ZOOM 

0% 

optional 

STEP 4 

75% 


Based on 

30 classes mapped = 

$79 (approx.) 


STEP TOTAL* = 

$200 (approx.) 

* Estimated computer costs for 1 of 1:24,000 quad map with: 

“^1 acre cells 

3 dates (12 spectral bands) 

2,000 cells defining training fields 
30 material types 

black-and-white lineprinter display. 

** Extensive : 

modification needed to improve efficiency. 





APPENDIX B 


ST-EP-W-ISE DISCRIMINANT ANALYSIS 


Discriminant analysis consists of finding a transform which 
minimizes the ratio of the difference between class multivariate 
means to the class multivariate variances. The algorithm used here 
and entitled CLASSIFY {Appendix A) computes a classification func- 
tion for each of the classes by choosing and inputting the independent 
variables, the 10 MSS band/ ratio values, in a stepwise manner. The 
variable or band/ ratio entered at each step is selected on the basis of 
its F statistics. As each MSS band/ratio is added a classification 
function is computed for each land use/land cover class. The equa- 

tion of the classification function D, . for the k class for the i 

ki 

variable or band/ ratio is given by 


D ki C ko + 2 S ki Z lki 

1=1 


th 


where is a constant term for the k class, r is the number of 

input variables (the 10 spectral band and ratios), e, is the 

ki 


CLASSIFY is a modified version of BMD07M which is part of 
the UCLA biomedical statistical package available on most major 
computers (BMD Manual, 1973). It has not been modified in statis- 
tical approach but in input, output, and internal control to enable it 
to handle much larger data bases in ways not envisioned by the 
original authors. 
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discriminant coefficient for the k^ 1 group and the i^ 1 band/ratio, and 

til til 

Z,, . is the measured spectral radiance of the 1 cell of the k class 
lki 

th 

for the i variable (10 bands /rati os). 

r _ 

The coefficient e is computed from (n-g) 2 X a., and 

ki kj ij 

r J _ _ 

the constant term is computed from X^, where X^. is 

the mean of band/ ratio i for class k, and n, number of cells in class 
k, and g the total number of land use/land cover classes sought in the 
analyses. 

The within and total cross-product matricies are expressed as 
below: 


g n k __ 

W = {w^ , W. . = z ^ (X ikn - X. k ) (X. kn - X. k ) 


X i> < X jkn 


where n^ is the number of cells in class k. 

i = 1, 2, 3 , .... p variables (10 spectral band/ ratio radiances) 
j = 1, 2, 3, ,...p variables (10 spectral band/ ratio radiances) 
At each step of the procedure the variables (radiances or ratios) 
are divided into two disjoint sets; those included in the discriminant 
functions and those not included. Assume for simplicity that the first 
r variables are included. The within-group matrix of cross products 



mo 


of deviations (W) and the matrix of cross products of deviations for 
the total samples (T) are partitioned into 



w 

11 

w 

12 


T 

1 11 

T 

1 12 

w = 

w 

_ 21 

■fir 

22_ 

T = 

T 

21 

T 

22 


where and are r by r. 

I 

The elements a. . are derived from matrix A, and the elements 
b. . from matrix B: 



The optimum input variables (spectral bands/ratios) are chosen 
on the basis of the largest F-statistic, where, for the entry of the 
variable. 



when n is the total number of cells and g is the number of classes. 
The degrees of freedom are g - 1 and n - r - g + 1. An iterative 
stepwise technique is used to determine the best linear combination 
of spectral bands /ratios (Siegel, 1976). 



Posterior probability of cell n in group k is computed in. step- 
wise discriminant analysis by the equation shown below (BMD Manual, 
1973 ); 


P 


e *p< r W 



APPENDIX C 


TRAINING SET CLASSIFICATION ACCURACY USING THE 
1 * PSEUDO -SUP E R VISE D' ’ TRAINING DATA 

Two iterations of statistical cleaning were applied to the 
’•pseudo -supervised” training sets. The overall and individual class 
accuracies were interpreted from the 10 band/ratio training set 
accuracy matrix. 85% overall’ training set accuracy was achieved 
before any statistical cleaning was applied. The overall apparent 
training set accuracy was increased to 99. 1% after two iterations of 
statistical cleaning were applied, while the overall actual training 
set accuracy was decreased to 83%. The training set accuracy for 
17 individual classes is shown on the diagonal of the matrix. 



TABLE C-l. A COMPARISON OF THE OPTIMAL FOUR CHANNELS SELECTED FOR THREE SETS OF TRAINING DATA 
IN ALL LEVELS OF CLEANING AND THEIR RESPECTIVE F VALUES TO ENTER. 


Sampling Method 


(1) Unsupervised 


(2) Supervised 


(3) Pseudo-supervised 


Before Cleaning 1st Cleaning 

Optimal 4 F to enter Optimal 4 F to enter 

7 2862.86 7 3541.67 

5 946.54 4 1422.41 

4 316.85 5 468.98 

5/4 382.94 5/4 531.06 

C 6/4 938.25 7/6 1154.84 

J 5 215.75 4 485.98 

| 6 145.34 5 131.91 

l 4 160.58 5/4 124.18 

6/4 1457.20 7/4 1890.42 

5/4 687.32 5/4 940.88 

6 221.10 7 405.37 

4 455.94 4 466.57 


2nd Cleaning 3rd Cleaning 


Optimal 4 

f to enter 

Optimal 4 

Fto enter 

7/4 

3762.05 

7/4 

3680.59 

4 

1552.12 

4 

1586.72 

5 

512.41 

5 

507.57 

5/4 

469.79 

5/4 

464.33 

7/6 

1025.73 

6 

1039.85 

4 

629.70 

4 

782.61 

5 

154.66 

6/4 

392.92 

5/4 

175.98 

5 

197.61 

7/4 

2127.08 





5/4 

1044.60 

— 

- 

7 

494.32 

— 

— 

4 

484.66 

— 

— 
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TABLE C-2. TRAINING SET CLASSIFICATION ACCURACY USING THE “PSEUDO-SUPERVISED” TRAINING DATA. 

17 classes showing the apparent increase in training set accuracy in percent using 10 channels (4 LANDSAT MSS 
bands and their 6 ratios). Only the residual training data was classified after the 1st level of statistical cleaning 
had been applied. 


Code 


100 

200 


300 

400 


500 


Land Use Class 


No, of Urban Agricultural Forested Barren Water 

Points 1 


Level / 


Urban lands 


Agricultural 

lands 


Forested 

lands 


Barren 

lands 


Water 

surfaces 


{ 



■< 

N. 


Code 

Levels II and III 

in T.S. 

110 

120 

211 

212 

221 

222 

223 

311 

312 

320 

410 

420 

430 

510 

520 

530 

540 

110 

Commercial 

90 

100^ 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

120 

Mixed 

126 

3 


\ 

0 

0 

0 

0 

0 

0 

0 

6 

1 

0 

0 

0 

0 

0 

211 

'Gram A 

71 

0 

1 

V/v 

. 0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

212 

Grain B 

96 

0 

0 

0 

100. 0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

221 

Crop A 

82 

0 

0 

1 

0 

94. 

5 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

222 

Crop B 

82 

0 

0 

0 

0 

6 

90.. 0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

223 

Crop C 

59 

0 

0 

0 

2 

19 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

311 

Hardwoods A 

131 

0 

0 

0 

0 

0 

0 

0 


10 

0 

0 

0 

0 

0 

0 

0 

0 

312 

Hardwoods B 

211 

0 

0 

0 

0 

3 

0 

0 

2 

95. 0 

0 

6 

0 

0 

0 

0 

0 

320 

Conifers 

78 

0 

0 

0 

0 

0 

0 

0 

4 

6 ^90^ 

0 

0 

0 

0 

0 

0 

0 

410 

Gravels 

76 

0 

14 

0 

0 

0 

0 

0 

0 

0 

0 


0 

0 1 

1 0 

0 

0 

0 

420 

Reclaimed 

64 

2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

o 

/ 

CO 

t 

0 

o 

0 

0 

430 

Tidal flat 

91 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 9V 

0 

0 

0 

0 

510 

Shallow seawater 

86 

0 

0 

0 

0 

0 

0 

0 

0 

0 

.0 

0 

0 

0 

:1Q0^ 

0 

0 

0 

520 

Medium seawater 

89 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 0 

100. 0 

0 

530 

Deep seawater 

89 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

p 

100^ 

0 

540 

Fresh water 

79 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 0 

0 

0 

ioo 


Overall accuracy = 94,6% obtained by 1514 correct identifications (diagonal) divided by 1600 residual samples in all training sets. 
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TABLE C-3. TRAINING SET CLASSIFICATION ACCURACY USING THE “PSEUDO-SUPERVISED” TRAINING DATA. 

17 classes showing the actual increase in training set accuracy in percent using 10 channels (4 LANDSAT MSS bands 
and their 6 ratios). All the original training data was classified with matrices obtained from the training data which 
remained after the 1st level of statistical cleaning. 


Land Use Class 

Code Level I Code Levels JI and III 


No, of Urban Agricultural Forested Barren Water 

Points 

in T.S : 110 120 211 212 221 222 223 311 312 320 410 420 430 510 520 530 540 


100 


200 


300 

400 


500 


Urban lands 


Agricultural 

lands 


Forested 

lands 


Barren 

lands 


Water 

surfaces 


Commercial 

100 

9K 

8 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

Mixed 

157 

13 


1 

0 

0 

0 

0 

0 

0 

0 

12 

3 

0 

0 

0 

0 

0 

Grain A 

82 

0 

6 

^86^ 

0 

1 

5 

0 

0 

0 

0 

2 

0 

0 

0 

0 

0 

0 

Grain B 

96 

0 

0 

0 

100^ 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Crop A 

104 

0 

0 

2 

2 

' 75 \ 

20 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

Crop B 

100 

0 

0 

2 

’o 

20^ 

74 \. 

0 

0 

0 

4 

0 

0 

0 

0 

0 

0 

0 

Crop C 

75 

0 

0 

0 

5 

25 

0 


0 

7 

0 

0 

0 

0 

0 

0 

0 

0 

Hardwood A 

156 

0 

0 

0 

0 

4 

0 

0 

' 76 \ 

20 

0 

0 

0 

0 

0 

0 

0 

0 

Hardwood B 

246 

0 

0 

0 

2 

5 

0 

0 

12 ^ 

8 K 

0 

0 

0 

0 

0 

0 

0 

0 

Conifers 

93 

0 

0 

0 

1 

3 

1 

0 

8 

12 

s 

75^ 

0 

0 

0 

0 

0 

0 

0 

Gravels 

95 - 

0 

30 

1 

0 

0 

0 

0 

0 

0 

0 

k 69. 

0 

0 

0 

0 

0 

0 

Reclaimed 

70 

4 

4 

0 

'o 

0 

0 

0 

0 

0 

0 

1 

90 \ 

0 

0 

0 

0 

0 

Tidal flat 

96 

6 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


94^ 

0 

0 

0 

0 

Shallow seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

3 


0 

0 

0 

Medium seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


99. 

0 

1 

Deep seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 99v. 

1 

Fresh water 

84 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

5 

0 

0 95 


Overall accuracy = 83.3% obtained by 1519 correct identifications (diagonal) divided by 1824 total samples in all training sets. 
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TABLE C-4. TRAINING SET CLASSIFICATION ACCURACY USING THE “PSEUDO-SUPERVISED” TRAINING DATA. 

17 classes showing the apparent increase in training set accuracy in percent using 10 channels (4 LANDSAT MSS bands 
and their 6 ratios). Only the residual training data was classified after the 2nd level of statistical cleaning had been 
applied. 


Land Use Class 


Code 

Level / 

Code 

Levels II and III 

100 

Urban lands - 

f no 
[ 120 

Commercial 

Mixed 


200 Agricultural 
lands 


300 


400 


500 


Forested 

lands 


Barren 

lands 


Water 

surfaces 


Grain A 
Grain B 
Crop A 
Crop B 
Crop C 
Hardwoods A 
Hardwoods B 
Conifers 
Gravels 
Reclaimed 
Tidal Hat 
Shallow seawater 
Medium seawater 
Deep seawater 
Fresh water 


No. of Urban Agricultural Forested Barren Water 

Points : 


in T.S. 

no 

120 

211 

212 

221 

222 

223 

311 

312 

320 

410 

420 

430 

510 

520 

530 

540 

86 

100. 0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

115 

0^99^ 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

69 

0 

0 

100. 0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

96 

0 

0 

0 

100^ 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

76 

0 

0 

3 

0 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

74 

0 

0 

0 

0 

3 

97. 0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

48 

0 

0 

2 

0 s 

4 

0 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

118 

0 

0 

0 

0 

0 

0 

0 

"99*. 1 

0 

0 

0 

0 

0 

0 

0 

0 

193 

0 

0 

0 

0 

0 

0 

0 

0 

100. 0 

0 

0 

0 

0 

0 

0 

0 

70 

0 

0 

0 

0 

0 

0 

0 

0 

0 

100^ 

0 

0 

0 

0 

0 

0 

0 

67 

0 

4 

0 

0 

0 

0 

0 

0 

0 

.0 

"96. 0 

0 

0 

0 

0 

0 

59 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

100^ 

0 

0 

0 

0 

0 

84 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

"loo^ 

0 

0 

0 

0 

84 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

100. 0 

0 

0 

89 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

100^ 

0 

0 

85 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 i 

0 

0 

100^ 

0 

79 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

o i 

0 

0 

0 

"loo 


Overall accuracy = 99.1% obtained by 1479 correct identifications (diagonal) divided by 1492 total samples m all training sets. 
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TABLE C-5. TRAINING SET CLASSIFICATION ACCURACY USING THE “PSEUDO-SUPER VISED” TRAINING DATA. 

17 classes showing the actual increase in training set accuracy in percent using 10 channels (4 LANDSAT MSS bands 
and their 6 ratios). All the original training data was classified with matrices obtained from the training data which 
remained after the 2nd level of statistical cleaning. 


Land Use Class 


Code Level I 


100 Urban lands 


200 


300 


400 


Agricultural 

lands 


Forested 

lands 


Barren 

lands 


500 Water 
surfaces 


No. of 
Points 


Urban 


Agricultural 


Forested 


Barren 


Water 


Levels II and III 

in T.S. 

110 

120 

211 

212 

221 

222 

223 

311 

312 

320 

410 

420 

430 

510 

520 

530 

540 

Commercial 

100 

91*. 8 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

Mixed 

157 

12 73^ 

1 

0 

0 

0 

0 

0 

0 

0 

13 

1 

0 

0 

0 

0 

0 

Grain A 

82 

0 

7 

"87. 0 

1 

4 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

Grain B 

96 

0 

0 

0 

100. 0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Crop A 

104 

0 

0 

4 

1 

/ 

52 

/ 

g 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

Crop B 

100 

0 

0 

3 

0 

21 

72. 0 

0 

4 

0 

0 

0 

0 

0 

0 

0 

0 

Crop C 

75 

0 

0 

1 

5 

27 

0 

60^ 

0 

7 

0 

0 

0 

0 

0 

0 

0 

0 

Hardwoods A 

156 

0 

0 

0 

0 

3 

0 

0 

"75^ 21 

0 

0 

0 

0 

0 

0 

0 

1 

Hardwoods B 

246 

0 

0 

0 

2 

5 

0 

0 

11 82. 0 

0 

0 

0 

0 

0 

0 

0 

Conifers 

93 

0 

0 

0 

3 

2 

0 

0 

8 

12 

75., 

0 

0 

0 

0 

0 

0 

0 

Gravels 

95 

0 

31 

1 

0 

0 

0 

0 

0 

0 

0 

^68^ 

0 

0 1 

0 

0 

0 

0 

Reclaimed 

70 

4 

4 

0 

0 

0 

0 

0 

0 

0 

0 

0 

92. 0 

0 

0 

0 

0 

Tidal flat 

96 

6 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 94 ^ 

0 

0 

0 

0 

Shallow seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

4 


0 

0 

0 

Medium seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

^99^ 

0 

1 

Deep seawater 

90 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


4 

Fresh water 

84 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

5 

0 

0 

. 0 

^95 


Overall accuracy = 83.0% obtained by 1514 correct identifications (diagonal) divided by 1824 total samples in all training sets. 
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APPENDIX D 

ACTUAL INCREASE IN TRAINING SET ACCURACY ACHIEVED 
AT EACH LEVEL OF STATISTICAL CLEANING 

Two iterations of statistical cleaning were applied to the 
"pseudo -supervised" training sets. Seventeen classes are repre- 
sented based on classification by the 10 MSS bands/ratios . The 
training set accuracy was increased as each band/ ratio added. The 
accuracy approaches a limit after three or four bands in urban, 
agricultural and forested lands. However, barren lands and water 
classes, such as gravels and medium seawater, fluctuate widely as 
the first four- bands added, then remain stable through 10 bands. 
Statistical cleaning contributes little to the actual training set accu- 
racy of most classes except mixed urban. 
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APPENDIX E 


MAXIMUM LIKELIHOOD CLASSIFIER 

The maximum likelihood ratioing technique (GLIKE in CSU 1 s 
RECOG) allows a different covariance matrix for each class. We 
assume our groups are multivariate normally distributed populations 
represented by data samples. Each population may be described 
mathematically by its mean vector, £ , and its covariance matrix, 2 
(Suppose we only have three variates, the populations can be shown 
pictorially in Fig. E -1). The hyper -ellipsoid (i class, defined by 
p. and 2L ) which each data sample belongs to, is best defined by the 
Gaussian probability density function, expressed in matrix form as 

P( 2 I c i> = — wilTTTTz [exp - - *i> T Si' 1 <2 ' *i )] 

(2tt) |E. | 

where X is the observation vector, 

N is the vector dimension size, 

is the mean vector for class i, and 
2 . is the covariance matrix for class i. 
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Defining: 

d(X j C.) = In P(X I C.) 

— i — i 

= -N/2 In 2 tt - 1/2 In j ^ J - 1/2 {X - £.) T sf 1 (X - p.) 
then the decision function is; 

if d(X | A) > d(X | B) for all A t B 
X is identified as belonging to class A. 

In GLIKE, we also can set a minimum acceptance threshold for 
computed P(X | Cl) values (Smith, Miller and Ells, 1972). 

However, the maximum likelihood ratio approach cannot per- 
form if the covariance matrices are singular because the probability 
of a data point belonging to a class cannot be computed if 2 ^ is not 


existed. 
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2 



Fig. E-l. ILLUSTRATING DATA GROUPS IN THREE DIMENSIONAL 
SPACE. Ellipsoids represent the covariance boundaries 
(from. Maxwell, 1974). 


The decision to classify a Sample point x. as class A rather 
than class B is made according to the equation 
P(x. | A) 


if 




> 1 for A ^ B. Decide A. 


P(x | B) - 

J 

For simplicity, we can use an exponent test obtained by taking 


the natural logarithm. 



